Limitations¶

Understanding the constraints and trade-offs of the current system.

ArXiv-Only Source¶

Current State¶

Only ArXiv papers are supported as primary sources
Cannot fetch papers from IEEE, ACM, Springer, PubMed, or other academic databases
No access to commercial journals or paywalled content

Why This Matters¶

Limited Coverage: Many important papers are not on ArXiv (especially older works, industry research, medical journals)
Recency Bias: ArXiv focuses on preprints, which may not be peer-reviewed
Domain Gaps: Medicine, biology, and some engineering fields have less ArXiv coverage

ArXiv API Constraints¶

Rate Limits: No authentication → basic rate limiting
Keyword Search Only: Results sorted by basic keyword matching, not semantic relevance
No Full-Text Search: Can only search titles, abstracts, authors, categories
Metadata Only: API returns metadata; PDFs must be downloaded separately

PDF Extraction Quality¶

Heavily formatted papers: Tables, graphs, and complex layouts often extracted poorly by PyPDF2
Mathematical notation: Equations frequently garbled or unreadable
Figures: Images and diagrams completely lost
Multi-column layouts: Column order sometimes mixed up

ScaleDown API Constraints¶

Compression-Only Service¶

ScaleDown is not an LLM — it cannot generate free-form answers
The "fallback generation" is really compressed extraction, not true generation
Cannot answer questions that require reasoning beyond the provided context

Compression Quality¶

Very short texts (<200 chars): Skipped, no compression applied
Highly technical content: May lose nuance when compressed 40-60%
Query dependency: Compression quality depends on how well the user question captures their intent

Latency¶

Each API call adds 1-3 seconds of latency
Multiple compression calls (context + artifacts) → 5-10s total
No batching → sequential calls

Cost¶

Requires a valid API key — no free tier
Usage-based pricing (per token compressed)

Gemini Free Tier Limitations¶

Rate Limits¶

The free Gemini API has strict requests per minute and tokens per day limits
Heavy usage triggers 429 errors
Each question with full pipeline = 3-4 Gemini API calls (triage, COT, verify, critique)

Model Capability¶

Gemini 2.5 Flash: Fast but not as capable as Pro models for complex multi-hop reasoning
Thinking Budget: The thinkingConfig parameter caps internal reasoning, potentially reducing quality on highly complex questions
Citation Accuracy: Even with strict prompts, the model sometimes hallucinates citations or misattributes sources

Context Window¶

While technically large (1M+ tokens), the effective context is limited by:
Cost (more tokens = higher API cost)
Quality degradation with very long contexts
Latency (longer contexts → slower responses)

RAG Limitations¶

TF-IDF Retrieval¶

Keyword-based, not semantic
Misses relevant chunks that use different terminology (synonym problem)
Example: Query "neural nets" won't match "artificial neural networks" unless both terms appear

Fixed Chunk Sizes¶

No respect for document structure — chunks may cut through:
Sentences
Paragraphs
Tables
Equations
Section boundaries
Context fragmentation can break semantic meaning

No Re-Ranking¶

Retrieved chunks are scored solely by TF-IDF cosine similarity
No cross-encoder or LLM-based re-ranking is applied
First-stage retrieval is final — no second-pass refinement

Source Tracking¶

Citations are at the paper level, not page/section level
Example: [arxiv:1706.03762] — but which part of the paper?
No automatic extraction of section/page metadata from chunks

General Limitations¶

No Real-Time Data¶

Only papers already on ArXiv
No preprint servers (bioRxiv, medRxiv, SSRN, etc.)
No blogs, conference talks, or live research

Single Language¶

English papers only
No multilingual support
Papers in other languages will be extracted but likely produce poor results

No Figure/Image Analysis¶

Extracted text doesn't include figures or diagrams
Cannot answer questions like "What does Figure 3 show?"
No vision model integration

Session State¶

Sessions stored as JSON files on disk, not in a database
No multi-user support
No cloud synchronization
Sessions lost if files are deleted

No Evaluation Framework¶

No automated hallucination detection
No quantified quality metrics
No benchmark datasets
Manual verification required

Performance Limitations¶

Latency¶

Full pipeline with paper discovery: - Triage + keyword extraction: ~2s - ArXiv search + PDF download: ~10-20s (parallel) - Text extraction + chunking + indexing: ~5s - Retrieval + compression: ~3s - COT generation: ~10-15s - Verification: ~5-8s - Critique: ~5-8s - Total: ~45-65 seconds

Direct answer (general question): - Triage: ~2s - Direct generation: ~5s - Total: ~7 seconds

Throughput¶

Single-threaded execution (no parallel Gemini calls)
No response streaming (wait for full response)
Rate limits restrict concurrent users

Security & Privacy¶

API Keys in `.env`¶

Keys stored in plain text
No encryption at rest
Accidental git commits expose keys (mitigated by .gitignore)

No Authentication¶

CLI tool has no user authentication
Anyone with file system access can:
View sessions
Read artifacts
Use your API keys

Data Storage¶

Papers, artifacts, sessions stored locally
No data encryption
No automatic cleanup of old data

Next: Possible Improvements¶

See Improvements for ideas to address these limitations.

Limitations¶

ArXiv-Only Source¶

Current State¶

Why This Matters¶

ArXiv API Constraints¶

PDF Extraction Quality¶

ScaleDown API Constraints¶

Compression-Only Service¶

Compression Quality¶

Latency¶

Cost¶

Gemini Free Tier Limitations¶

Rate Limits¶

Model Capability¶

Context Window¶

RAG Limitations¶

TF-IDF Retrieval¶

Fixed Chunk Sizes¶

No Re-Ranking¶

Source Tracking¶

General Limitations¶

No Real-Time Data¶

Single Language¶

No Figure/Image Analysis¶

Session State¶

No Evaluation Framework¶

Performance Limitations¶

Latency¶

Throughput¶

Security & Privacy¶

API Keys in .env¶

No Authentication¶

Data Storage¶

Next: Possible Improvements¶

API Keys in `.env`¶