Profile-aware Retrieval Reranking (Issue #30)
Goal
Improve retrieval relevance by applying deterministic user-profile context signals before GPT reranking.
Profile fields used in retrieval path:
gradesubjectmajor(when available)tier
Pipeline placement
The retrieval stack now runs:
- semantic + lexical candidate generation
- weighted hybrid fusion (Issue #29)
- deterministic profile-aware rerank (Issue #30)
- GPT-mini rerank
This ensures grade/subject/tier context is consistently represented even when LLM reranking is unavailable.
Deterministic boost formula
For each candidate chunk, we compute:
grade_matchboost:+0.12subject_matchboost:+0.10major_matchboost:+0.05grade+subject bonus:+0.08
Then apply tier multiplier:
free:1.00standard:1.05premium:1.10
Final contextual score:
contextual_score = hybrid_score + (raw_profile_boost * tier_multiplier)
Returned metadata includes a full breakdown under metadata.profile_rerank for
traceability.
Fallback behavior (missing profile data)
If no profile signals are available (grade, subject, and major all
missing), reranking is skipped and hybrid order is preserved.
In this fallback mode, each candidate includes:
Evaluation signal (relevance lift)
Regression test coverage demonstrates lift versus baseline ordering:
- baseline: higher hybrid score can rank first even if off-profile
- profile-aware rerank: matching grade+subject candidate is promoted above off-profile candidates with similar baseline scores
See:
tests/services/test_retrieval_pipeline.py::test_profile_rerank_boosts_grade_subject_matches_over_higher_baseline_score