Plan: GH-0028 Multilingual Chat Input and Output
Context
This issue covers the multilingual boundary of the backend chat flow: detect incoming prompt language, translate when needed for retrieval, and return answers in the requested language.
Problem
Students can ask in Arabic, French, or English, but retrieval and generation quality degrade if input language, retrieval language, and response language are not handled consistently.
Current state in repo
app/services/retrieval_pipeline.pydetects query language and can translate queries before retrieval.app/services/llm.pyincludes translation and output-language guard behavior.app/services/language_context.pyholds lightweight language metadata logic.app/agents/teacher_agent.pytracks prompt, retrieval, and response language fields.- Tests already cover translation and language-guard behavior.
Target state
- Chat input language is detected reliably.
- Retrieval uses a consistent canonical query language when required.
- Final answers are returned in the user's requested language while preserving citations and math formatting.
Constraints
- Backend-only scope.
- Retrieval and output language handling must remain explicit and auditable.
- Translation behavior must not break citation markers or grounded answers.
- The solution must integrate with the existing agent and service boundaries.
Proposed approach
- Resolve prompt language at the start of the request.
- Translate only the retrieval query when corpus language requires it.
- Keep response language separate from retrieval language.
- Apply a final language guard after grounded synthesis so output stays in the requested language.
Risks
- Extra translation steps can add latency and cost.
- Language detection errors can degrade retrieval relevance.
- Translation can distort student intent or citation formatting if guardrails are weak.
Open questions
- Should prompt language detection happen in the router, the agent, or the retrieval layer only?
- Which language transitions should be logged for debugging and analytics?
Acceptance criteria
- A plan doc exists for
#28underdocs/plans/. - The doc distinguishes prompt language, retrieval language, and response language.
- The plan stays backend-only and grounded in current code paths.
- The likely service and test files are identified.
Files likely to change
docs/plans/gh-0028-multilingual-chat-io.mdapp/agents/teacher_agent.pyapp/services/retrieval_pipeline.pyapp/services/llm.pyapp/services/language_context.pytests/services/test_retrieval_pipeline.pytests/services/test_llm.py
Related issue
#28-[Backend][Chat] Language detect/translate in + translate out
Status
Backfilled planning stub