Architecture Overview¶

The Scientific Literature Explorer uses a modular pipeline architecture that combines retrieval-augmented generation, context compression, and multi-stage verification to produce accurate, well-cited answers from scientific literature.

System Architecture¶

%%{init: {'theme':'base','themeVariables':{'fontFamily':'sans-serif'}}}%%
flowchart TD
    Start([User Question]) --> Triage[Research Agent<br/>Triage + Keywords<br/>Gemini]

    Triage -->|General Question| DirectAnswer[Direct Answer<br/>Gemini COT]
    Triage -->|Research Question| ArXiv[ArXiv API Search]

    ArXiv --> PDFs[Parallel PDF Download<br/>& Text Extraction<br/>PyPDF2]
    PDFs --> RAG[RAG Pipeline<br/>Chunk → TF-IDF → Retrieve]

    RAG --> Compress[ScaleDown API<br/>Context Compression<br/>40-60% reduction]

    Compress --> Workflow[Anti-Hallucination Workflow]
    DirectAnswer --> Workflow

    subgraph Workflow [Reasoning Pipeline]
        COT[Chain-of-Thought<br/>Gemini 2048 tokens<br/>Strict Citations]
        Verify[Self-Verification<br/>Gemini Fast 1024 tokens<br/>Citation Check]
        Critique[Self-Critique<br/>Gemini Fast 1024 tokens<br/>Quality Review]

        COT --> Verify
        Verify --> Critique
    end

    Workflow --> Store[Artifact Storage<br/>Compressed Markdown]
    Store --> Session[(Session JSON<br/>Conversation History)]

    Session --> Output([Answer + Citations])

    style Start fill:#1976d2,stroke:#0d47a1,stroke-width:3px,color:#fff,rx:10,ry:10
    style Output fill:#2e7d32,stroke:#1b5e20,stroke-width:3px,color:#fff,rx:10,ry:10
    style Compress fill:#f57c00,stroke:#e65100,stroke-width:3px,color:#fff,rx:5,ry:5
    style Workflow fill:#7b1fa2,stroke:#4a148c,stroke-width:2px,color:#fff,rx:5,ry:5
    style Triage fill:#d32f2f,stroke:#b71c1c,stroke-width:3px,color:#fff,rx:5,ry:5
    style DirectAnswer fill:#0288d1,stroke:#01579b,stroke-width:2px,color:#fff,rx:5,ry:5
    style ArXiv fill:#0288d1,stroke:#01579b,stroke-width:2px,color:#fff,rx:5,ry:5
    style PDFs fill:#0288d1,stroke:#01579b,stroke-width:2px,color:#fff,rx:5,ry:5
    style RAG fill:#0288d1,stroke:#01579b,stroke-width:2px,color:#fff,rx:5,ry:5
    style Store fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff,rx:5,ry:5
    style Session fill:#5d4037,stroke:#3e2723,stroke-width:2px,color:#fff,rx:5,ry:5

    linkStyle default stroke:#e0e0e0,stroke-width:2px

Core Components¶

1. Research Agent (Triage + Discovery)¶

Purpose: Intelligently routes questions and discovers relevant papers

Question Classification: Uses Gemini to classify as general, conceptual, or research
Keyword Extraction: Extracts search terms and ArXiv query in the same API call
Paper Discovery: Searches ArXiv Atom API with parallel PDF downloads
Fallback Handling: Uses heuristic keyword extraction if Gemini is rate-limited

2. RAG Pipeline¶

Purpose: Retrieves and prepares relevant paper content

Chunking: Splits papers into overlapping segments (1000 chars, 200 overlap)
TF-IDF Indexing: scikit-learn vectorization with stop-word removal
Retrieval: Cosine similarity matching, returns top-k chunks (default: 5)
Source Tracking: Every chunk labeled with arxiv:XXXX.XXXXX for citations

3. ScaleDown Compression¶

Purpose: Reduces token count while preserving semantic meaning

Context Compression: 40-60% token reduction before sending to LLM
Query-Aware: Uses the user's question to guide what information to preserve
Tokenizer Optimization: Optimized for gemini-2.5-flash tokenizer
Artifact Compression: Also compresses COT traces before storage

4. Anti-Hallucination Workflow¶

Purpose: Multi-stage verification ensures factual accuracy

Chain-of-Thought (COT): Requires inline citations for every claim
Self-Verification: Separate Gemini call checks each citation
Self-Critique: Optional quality evaluation
Configurable: Stages can be toggled or reordered via CLI

5. Artifact Storage¶

Purpose: Persistent storage of reasoning traces

Markdown Format: Each stage output saved as .md file
Compression: Artifacts compressed via ScaleDown before storage
Metadata: Timestamps, token counts, compression stats tracked
Organization: Separate folders for cot/, self_verify/, self_critique/

6. Session Management¶

Purpose: Maintains conversation history across interactions

JSON Persistence: Each session stored as {session_id}.json
Multi-Turn Support: Previous Q&A included in context
Paper Tracking: Tracks which papers were ingested per session
Automatic Loading: Latest session auto-loaded if applicable

Data Flow¶

Question Processing Flow¶

User submits question → Research Agent
Triage classification → Routes to appropriate handler
Paper discovery (if needed) → ArXiv API + parallel PDF downloads
Text extraction → PyPDF2 processes PDFs
Chunking + Indexing → TF-IDF vectorization
Retrieval → Top-k chunks by cosine similarity
Compression → ScaleDown reduces token count
Reasoning → Multi-stage workflow generates answer
Storage → Artifacts saved, session updated
Response → Final answer with citations returned

Technology Stack¶

Layer	Technology	Purpose
Intelligence	Google Gemini 2.5 Flash	Answer generation, classification, verification
Compression	ScaleDown API	Context compression, fallback generation
Paper Source	ArXiv Atom API	Scientific paper search and metadata
PDF Processing	PyPDF2	Text extraction from PDFs
Retrieval	scikit-learn TF-IDF	Vectorization and similarity search
Storage	JSON (sessions), Markdown (artifacts)	Persistence
CLI	Rich (terminal UI)	Interactive tables, panels, markdown rendering

Next: How It Works¶

See How It Works for the detailed end-to-end flow and step-by-step breakdown.