Skip to content

Hybrid Retrieval Fusion (Issue #29)

Summary

The retrieval pipeline now builds two candidate sets for each query:

  1. Semantic candidates from Pinecone (dense vectors)
  2. Lexical candidates from Supabase/Postgres full-text search (search_chunks_lexical RPC)

Candidates are blended with a 75/25 semantic/lexical target and then ordered with weighted Reciprocal Rank Fusion (RRF).

Scoring model

1) Weighted blend score (observability + confidence)

For each candidate document d:

  • semantic_norm(d) = semantic_raw(d) / max_semantic_raw
  • lexical_norm(d) = lexical_raw(d) / max_lexical_raw

Then:

weighted_blend_score(d) = 0.75 * semantic_norm(d) + 0.25 * lexical_norm(d)

This is surfaced as result.score for downstream compatibility.

2) Weighted RRF score (final ordering)

With k = 60, ranks starting at 1:

rrf_score(d) = 0.75 / (k + rank_semantic(d)) + 0.25 / (k + rank_lexical(d))

If a candidate appears in only one list, the missing term is 0.

Final ordering sorts by:

  1. rrf_score (descending)
  2. weighted_blend_score (descending)
  3. raw semantic/lexical scores (descending tie-break)

Retrieval traceability logs

Each retrieval call logs:

  • candidate counts (semantic, lexical, fused) and fusion-stage latency (fusion_latency_ms)
  • top raw semantic and lexical scores
  • top fused outputs with id, blend score, rrf score, and source contributions

Log markers:

  • HYBRID_RETRIEVAL candidates ...
  • HYBRID_RETRIEVAL semantic_scores=... lexical_scores=...
  • HYBRID_RETRIEVAL fusion_top=...

Each returned chunk also includes:

  • metadata.retrieval_sources (["semantic"], ["lexical"], or both)
  • metadata.hybrid_fusion with rank/score breakdown and RRF components

Latency regression gate (CI)

To keep hybrid fusion release-stable, we enforce a deterministic p95 latency gate in tests/services/test_retrieval_pipeline.py:

  • Test: test_hybrid_fusion_latency_regression_gate_p95_within_budget
  • Workload: 100 semantic candidates + 80 lexical candidates (with overlap), top_k=25
  • Iterations: RetrievalPipeline.HYBRID_FUSION_LATENCY_BENCHMARK_ITERATIONS (60)
  • Threshold: RetrievalPipeline.HYBRID_FUSION_LATENCY_BUDGET_MS (20.0 ms p95)

Measurement method:

  1. Run warm-up fusion passes to reduce interpreter cold-start jitter.
  2. Measure each _fuse_hybrid_candidates(...) call using perf_counter().
  3. Compute p95 over the sampled durations and fail CI if p95 exceeds the budget.

This gate is intentionally scoped to the fusion stage (not network I/O) so regressions in ranking complexity are caught reliably in unit-test conditions.

DB function

Migration: db/migrations/20260301000031_hybrid_retrieval_lexical_rpc.sql

Adds RPC function:

  • search_chunks_lexical(p_query_text, p_match_count, p_grade, p_subject, p_language)

Returns per-hit:

  • chunk_id
  • bm25_score (ts_rank_cd-based lexical score)
  • metadata payload (grade, subject, source URL, page, canonicalization/BM25 fields, preview)