Reasoning-grounded synthesis (Issue #31)

Summary

The final answer synthesis stage now uses a dedicated OpenAI reasoning model (OPENAI_REASONING_MODEL, default o4-mini) with explicit grounding and safety constraints.

Prompt contract

app/services/llm.py::generate_answer now enforces this synthesis contract:

Use only facts grounded in retrieved context snippets.
Include source citations (e.g., [1], [2]) for factual claims.
If context is insufficient, explicitly return uncertainty instead of guessing.
Keep output in requested language while preserving student-requested format.

For exercise-type questions, the model is further constrained to provide guided hints rather than full final solutions.

Guardrails

1) Empty-context fallback

If context is empty or set to NO_CONTEXT_FOUND, answer generation bypasses model calls and returns a deterministic uncertainty response in the target language.

2) Hallucination guard

When retrieved context includes citation IDs, final non-streaming answers are post-validated: - if no citations appear in the answer, or - if the answer includes citation IDs that are not present in the retrieved context, and - the answer is not already an uncertainty response,

then the response is replaced with an uncertainty-safe fallback plus available citation IDs.

Citation parsing accepts compact ([1]), comma-separated ([1, 2]), and range ([1-2]) reference notation to reduce false negatives while still rejecting unknown IDs.

3) Language safety

If the final answer drifts from requested output language (fr/ar/en), a translation guard translates it back while preserving: - citation markers ([n]) - math notation - bullets/paragraph structure

Tests

Added/updated tests in tests/services/test_llm.py: - empty-context behavior without model call - hallucination guardrail behavior (missing citations + out-of-context citation IDs) - translation-out language safety behavior for Arabic, French, and English targets - existing baseline non-stream generation