Skip to content

Retrieval

Current retrieval pipeline

The primary retrieval implementation is app/services/retrieval_pipeline.py.

The pipeline does the following:

  1. Detect query language with GPTMiniService.detect_language
  2. Translate the query when the query language differs from the corpus language
  3. Generate an embedding with EmbeddingService
  4. Query Pinecone for semantic candidates
  5. Query Supabase Postgres RPC search_chunks_lexical for lexical candidates
  6. Blend semantic and lexical scores with weighted reciprocal-rank fusion
  7. Apply deterministic profile-aware reranking
  8. Apply GPT-mini reranking
  9. Fetch full chunk text from Postgres-backed chunk storage

Weighting and ranking

Current hard-coded weights are:

  • semantic weight: 0.75
  • lexical weight: 0.25
  • reciprocal-rank fusion constant: 60

Profile-aware reranking boosts candidates when chunk metadata matches:

  • grade
  • subject
  • major
  • subscription tier multiplier

Tier limits are resolved through app/services/tier_config.py.

Multilingual behavior

Chat language handling is split across:

  • app/services/language_context.py
  • app/services/gpt_mini.py
  • app/services/llm.py

Current behavior:

  • prompt and response language follow the user request
  • retrieval language is normalized to English in chat flow
  • canonical chunk storage for hybrid ingestion is English-first
  • final answers are guarded for citation use and language consistency

Chat orchestration around retrieval

app/agents/teacher_agent.py wraps retrieval inside a LangGraph state machine:

  • check_wallet
  • retrieve
  • ask_clarifying
  • finalize

The graph asks for clarification when:

  • the question is vague and context is missing
  • retrieval returns no matches
  • the best retrieval score is weak

Operational caveats

  • Retrieval depends on both Pinecone and the lexical SQL RPC being healthy.
  • The final answer generator enforces source-grounding heuristics, so weak citations can degrade into uncertainty responses even when retrieval succeeds.
  • Cached rerank results live in an in-memory cache, so they are not shared across processes or deploys.