Retrieval
Current retrieval pipeline
The primary retrieval implementation is app/services/retrieval_pipeline.py.
The pipeline does the following:
- Detect query language with
GPTMiniService.detect_language - Translate the query when the query language differs from the corpus language
- Generate an embedding with
EmbeddingService - Query Pinecone for semantic candidates
- Query Supabase Postgres RPC
search_chunks_lexicalfor lexical candidates - Blend semantic and lexical scores with weighted reciprocal-rank fusion
- Apply deterministic profile-aware reranking
- Apply GPT-mini reranking
- Fetch full chunk text from Postgres-backed chunk storage
Weighting and ranking
Current hard-coded weights are:
- semantic weight:
0.75 - lexical weight:
0.25 - reciprocal-rank fusion constant:
60
Profile-aware reranking boosts candidates when chunk metadata matches:
- grade
- subject
- major
- subscription tier multiplier
Tier limits are resolved through app/services/tier_config.py.
Multilingual behavior
Chat language handling is split across:
app/services/language_context.pyapp/services/gpt_mini.pyapp/services/llm.py
Current behavior:
- prompt and response language follow the user request
- retrieval language is normalized to English in chat flow
- canonical chunk storage for hybrid ingestion is English-first
- final answers are guarded for citation use and language consistency
Chat orchestration around retrieval
app/agents/teacher_agent.py wraps retrieval inside a LangGraph state machine:
check_walletretrieveask_clarifyingfinalize
The graph asks for clarification when:
- the question is vague and context is missing
- retrieval returns no matches
- the best retrieval score is weak
Operational caveats
- Retrieval depends on both Pinecone and the lexical SQL RPC being healthy.
- The final answer generator enforces source-grounding heuristics, so weak citations can degrade into uncertainty responses even when retrieval succeeds.
- Cached rerank results live in an in-memory cache, so they are not shared across processes or deploys.