Configuration Reference¶
All configuration is managed through environment variables loaded from the .env file.
Environment Variables¶
Required Variables¶
| Variable | Description |
|---|---|
SCALEDOWN_API_KEY |
Required. Your ScaleDown API key from ScaleDown. Used for context compression and fallback generation. |
GEMINI_API_KEY |
Required. Your Google Gemini API key from AI Studio. Used for question triage, keyword extraction, COT reasoning, verification, and critique. |
Optional Configuration¶
| Variable | Default | Description |
|---|---|---|
SCALEDOWN_MODEL |
gemini-2.5-flash |
Target model for ScaleDown compression optimization. ScaleDown will optimize the tokenization for this specific model's tokenizer. Valid values: gpt-4o, claude-3-5-sonnet, gemini-2.5-flash, etc. |
GEMINI_MODEL |
gemini-2.5-flash |
The Gemini model to use for generation. Options: gemini-2.5-flash, gemini-1.5-flash, gemini-1.5-pro, etc. Flash models are faster/cheaper but less capable than Pro models. |
SCALEDOWN_TIMEOUT |
15 |
Timeout in seconds for ScaleDown API calls. Increase if you see timeout errors on slow networks. |
CHUNK_SIZE |
1000 |
Number of characters per text chunk when splitting papers. Larger chunks = more context per chunk but fewer chunks retrieved. |
CHUNK_OVERLAP |
200 |
Number of overlapping characters between adjacent chunks. Prevents information loss at chunk boundaries. |
TOP_K |
5 |
Number of chunks to retrieve per RAG query. Higher = more context but also more tokens and potential noise. |
Model Selection¶
ScaleDown Model¶
The SCALEDOWN_MODEL variable tells ScaleDown which tokenizer to optimize for. Use:
- gemini-2.5-flash if you're using Gemini 2.5 Flash (default)
- gpt-4o if you're using OpenAI's GPT-4o
- claude-3-5-sonnet if you're using Claude 3.5 Sonnet
This does NOT change which model ScaleDown uses internally — it only optimizes the compression output for your target model's tokenizer.
Gemini Model¶
The GEMINI_MODEL variable selects which Gemini model to use for generation:
- gemini-2.5-flash (default): Fastest, cheapest, good quality
- gemini-1.5-flash: Previous generation, slower than 2.5
- gemini-1.5-pro: Much smarter but slower and more expensive
RAG Configuration¶
Chunk Size¶
The CHUNK_SIZE controls how large each text chunk is:
- Too small (e.g., 200): Chunks lose semantic meaning, context fragmentation
- Too large (e.g., 5000): Fewer chunks retrieved, may miss relevant details
- Default 1000: Good balance for most scientific papers
Chunk Overlap¶
The CHUNK_OVERLAP ensures no information is lost at boundaries:
- No overlap (0): Risk of splitting sentences/paragraphs
- Too much overlap (500+): Redundant content, wasted tokens
- Default 200: Usually captures 1-2 sentences of overlap
Top-K¶
The TOP_K controls how many chunks are retrieved:
- Too few (<3): May miss important information
- Too many (>10): More noise, higher token costs
- Default 5: Works well for most questions
Example .env¶
# Required
SCALEDOWN_API_KEY=sk_sd_abc123...
GEMINI_API_KEY=AIza...
# Optional - Uncomment to override defaults
# SCALEDOWN_MODEL=gemini-2.5-flash
# GEMINI_MODEL=gemini-2.5-flash
# SCALEDOWN_TIMEOUT=15
# CHUNK_SIZE=1000
# CHUNK_OVERLAP=200
# TOP_K=5
Next: Usage Guide¶
See Usage Guide for all available commands and examples.