Limitations¶
Understanding the constraints and trade-offs of the current system.
ArXiv-Only Source¶
Current State¶
- Only ArXiv papers are supported as primary sources
- Cannot fetch papers from IEEE, ACM, Springer, PubMed, or other academic databases
- No access to commercial journals or paywalled content
Why This Matters¶
- Limited Coverage: Many important papers are not on ArXiv (especially older works, industry research, medical journals)
- Recency Bias: ArXiv focuses on preprints, which may not be peer-reviewed
- Domain Gaps: Medicine, biology, and some engineering fields have less ArXiv coverage
ArXiv API Constraints¶
- Rate Limits: No authentication → basic rate limiting
- Keyword Search Only: Results sorted by basic keyword matching, not semantic relevance
- No Full-Text Search: Can only search titles, abstracts, authors, categories
- Metadata Only: API returns metadata; PDFs must be downloaded separately
PDF Extraction Quality¶
- Heavily formatted papers: Tables, graphs, and complex layouts often extracted poorly by PyPDF2
- Mathematical notation: Equations frequently garbled or unreadable
- Figures: Images and diagrams completely lost
- Multi-column layouts: Column order sometimes mixed up
ScaleDown API Constraints¶
Compression-Only Service¶
- ScaleDown is not an LLM — it cannot generate free-form answers
- The "fallback generation" is really compressed extraction, not true generation
- Cannot answer questions that require reasoning beyond the provided context
Compression Quality¶
- Very short texts (<200 chars): Skipped, no compression applied
- Highly technical content: May lose nuance when compressed 40-60%
- Query dependency: Compression quality depends on how well the user question captures their intent
Latency¶
- Each API call adds 1-3 seconds of latency
- Multiple compression calls (context + artifacts) → 5-10s total
- No batching → sequential calls
Cost¶
- Requires a valid API key — no free tier
- Usage-based pricing (per token compressed)
Gemini Free Tier Limitations¶
Rate Limits¶
- The free Gemini API has strict requests per minute and tokens per day limits
- Heavy usage triggers 429 errors
- Each question with full pipeline = 3-4 Gemini API calls (triage, COT, verify, critique)
Model Capability¶
- Gemini 2.5 Flash: Fast but not as capable as Pro models for complex multi-hop reasoning
- Thinking Budget: The
thinkingConfigparameter caps internal reasoning, potentially reducing quality on highly complex questions - Citation Accuracy: Even with strict prompts, the model sometimes hallucinates citations or misattributes sources
Context Window¶
- While technically large (1M+ tokens), the effective context is limited by:
- Cost (more tokens = higher API cost)
- Quality degradation with very long contexts
- Latency (longer contexts → slower responses)
RAG Limitations¶
TF-IDF Retrieval¶
- Keyword-based, not semantic
- Misses relevant chunks that use different terminology (synonym problem)
- Example: Query "neural nets" won't match "artificial neural networks" unless both terms appear
Fixed Chunk Sizes¶
- No respect for document structure — chunks may cut through:
- Sentences
- Paragraphs
- Tables
- Equations
- Section boundaries
- Context fragmentation can break semantic meaning
No Re-Ranking¶
- Retrieved chunks are scored solely by TF-IDF cosine similarity
- No cross-encoder or LLM-based re-ranking is applied
- First-stage retrieval is final — no second-pass refinement
Source Tracking¶
- Citations are at the paper level, not page/section level
- Example:
[arxiv:1706.03762]— but which part of the paper? - No automatic extraction of section/page metadata from chunks
General Limitations¶
No Real-Time Data¶
- Only papers already on ArXiv
- No preprint servers (bioRxiv, medRxiv, SSRN, etc.)
- No blogs, conference talks, or live research
Single Language¶
- English papers only
- No multilingual support
- Papers in other languages will be extracted but likely produce poor results
No Figure/Image Analysis¶
- Extracted text doesn't include figures or diagrams
- Cannot answer questions like "What does Figure 3 show?"
- No vision model integration
Session State¶
- Sessions stored as JSON files on disk, not in a database
- No multi-user support
- No cloud synchronization
- Sessions lost if files are deleted
No Evaluation Framework¶
- No automated hallucination detection
- No quantified quality metrics
- No benchmark datasets
- Manual verification required
Performance Limitations¶
Latency¶
Full pipeline with paper discovery: - Triage + keyword extraction: ~2s - ArXiv search + PDF download: ~10-20s (parallel) - Text extraction + chunking + indexing: ~5s - Retrieval + compression: ~3s - COT generation: ~10-15s - Verification: ~5-8s - Critique: ~5-8s - Total: ~45-65 seconds
Direct answer (general question): - Triage: ~2s - Direct generation: ~5s - Total: ~7 seconds
Throughput¶
- Single-threaded execution (no parallel Gemini calls)
- No response streaming (wait for full response)
- Rate limits restrict concurrent users
Security & Privacy¶
API Keys in .env¶
- Keys stored in plain text
- No encryption at rest
- Accidental git commits expose keys (mitigated by
.gitignore)
No Authentication¶
- CLI tool has no user authentication
- Anyone with file system access can:
- View sessions
- Read artifacts
- Use your API keys
Data Storage¶
- Papers, artifacts, sessions stored locally
- No data encryption
- No automatic cleanup of old data
Next: Possible Improvements¶
See Improvements for ideas to address these limitations.