Reasoning-grounded synthesis (Issue #31)
Summary
The final answer synthesis stage now uses a dedicated OpenAI reasoning model (OPENAI_REASONING_MODEL, default o4-mini) with explicit grounding and safety constraints.
Prompt contract
app/services/llm.py::generate_answer now enforces this synthesis contract:
- Use only facts grounded in retrieved context snippets.
- Include source citations (e.g.,
[1],[2]) for factual claims. - If context is insufficient, explicitly return uncertainty instead of guessing.
- Keep output in requested language while preserving student-requested format.
For exercise-type questions, the model is further constrained to provide guided hints rather than full final solutions.
Guardrails
1) Empty-context fallback
If context is empty or set to NO_CONTEXT_FOUND, answer generation bypasses model calls and returns a deterministic uncertainty response in the target language.
2) Hallucination guard
When retrieved context includes citation IDs, final non-streaming answers are post-validated: - if no citations appear in the answer, or - if the answer includes citation IDs that are not present in the retrieved context, and - the answer is not already an uncertainty response,
then the response is replaced with an uncertainty-safe fallback plus available citation IDs.
Citation parsing accepts compact ([1]), comma-separated ([1, 2]), and range ([1-2]) reference notation to reduce false negatives while still rejecting unknown IDs.
3) Language safety
If the final answer drifts from requested output language (fr/ar/en), a translation guard translates it back while preserving:
- citation markers ([n])
- math notation
- bullets/paragraph structure
Tests
Added/updated tests in tests/services/test_llm.py:
- empty-context behavior without model call
- hallucination guardrail behavior (missing citations + out-of-context citation IDs)
- translation-out language safety behavior for Arabic, French, and English targets
- existing baseline non-stream generation