Skip to content

Configuration Reference

All configuration is managed through environment variables loaded from the .env file.


Environment Variables

Required Variables

Variable Description
SCALEDOWN_API_KEY Required. Your ScaleDown API key from ScaleDown. Used for context compression and fallback generation.
GEMINI_API_KEY Required. Your Google Gemini API key from AI Studio. Used for question triage, keyword extraction, COT reasoning, verification, and critique.

Optional Configuration

Variable Default Description
SCALEDOWN_MODEL gemini-2.5-flash Target model for ScaleDown compression optimization. ScaleDown will optimize the tokenization for this specific model's tokenizer. Valid values: gpt-4o, claude-3-5-sonnet, gemini-2.5-flash, etc.
GEMINI_MODEL gemini-2.5-flash The Gemini model to use for generation. Options: gemini-2.5-flash, gemini-1.5-flash, gemini-1.5-pro, etc. Flash models are faster/cheaper but less capable than Pro models.
SCALEDOWN_TIMEOUT 15 Timeout in seconds for ScaleDown API calls. Increase if you see timeout errors on slow networks.
CHUNK_SIZE 1000 Number of characters per text chunk when splitting papers. Larger chunks = more context per chunk but fewer chunks retrieved.
CHUNK_OVERLAP 200 Number of overlapping characters between adjacent chunks. Prevents information loss at chunk boundaries.
TOP_K 5 Number of chunks to retrieve per RAG query. Higher = more context but also more tokens and potential noise.

Model Selection

ScaleDown Model

The SCALEDOWN_MODEL variable tells ScaleDown which tokenizer to optimize for. Use: - gemini-2.5-flash if you're using Gemini 2.5 Flash (default) - gpt-4o if you're using OpenAI's GPT-4o - claude-3-5-sonnet if you're using Claude 3.5 Sonnet

This does NOT change which model ScaleDown uses internally — it only optimizes the compression output for your target model's tokenizer.

Gemini Model

The GEMINI_MODEL variable selects which Gemini model to use for generation: - gemini-2.5-flash (default): Fastest, cheapest, good quality - gemini-1.5-flash: Previous generation, slower than 2.5 - gemini-1.5-pro: Much smarter but slower and more expensive


RAG Configuration

Chunk Size

The CHUNK_SIZE controls how large each text chunk is: - Too small (e.g., 200): Chunks lose semantic meaning, context fragmentation - Too large (e.g., 5000): Fewer chunks retrieved, may miss relevant details - Default 1000: Good balance for most scientific papers

Chunk Overlap

The CHUNK_OVERLAP ensures no information is lost at boundaries: - No overlap (0): Risk of splitting sentences/paragraphs - Too much overlap (500+): Redundant content, wasted tokens - Default 200: Usually captures 1-2 sentences of overlap

Top-K

The TOP_K controls how many chunks are retrieved: - Too few (<3): May miss important information - Too many (>10): More noise, higher token costs - Default 5: Works well for most questions


Example .env

# Required
SCALEDOWN_API_KEY=sk_sd_abc123...
GEMINI_API_KEY=AIza...

# Optional - Uncomment to override defaults
# SCALEDOWN_MODEL=gemini-2.5-flash
# GEMINI_MODEL=gemini-2.5-flash
# SCALEDOWN_TIMEOUT=15
# CHUNK_SIZE=1000
# CHUNK_OVERLAP=200
# TOP_K=5

Next: Usage Guide

See Usage Guide for all available commands and examples.