Skip to content

Plan: GH-0032 Release Verification and Observability

Context

This issue is the backend release gate for proving the scrape -> ingest -> retrieve -> answer pipeline is observable and stable enough to ship.

Problem

A backend release is risky without explicit verification gates, request tracing, metrics, and regression checks that cover the end-to-end path rather than isolated units only.

Current state in repo

  • app/core/metrics.py, app/core/logging.py, and app/core/middleware.py provide metrics, structured logs, and request IDs.
  • app/api/routers/metrics.py is registered in the API router and has tests in tests/routers/test_metrics.py.
  • Promptfoo assets and CI workflow exist under promptfoo/ and .github/workflows/promptfoo-rag-eval.yml.
  • Request correlation and hybrid retrieval logging already exist in runtime code.

Target state

  • The backend has a documented release verification path with observable checkpoints.
  • Request IDs, metrics, and retrieval logs are sufficient to trace production failures.
  • Regression tooling covers both service-level and end-to-end backend behavior.

Constraints

  • Backend-only scope.
  • Verification should rely on code and CI artifacts in this repo, not a separate admin interface.
  • Release evidence must be usable by both humans and AI agents.
  • Observability additions must not materially break runtime performance.

Proposed approach

  1. Treat request ID propagation, metrics endpoints, and structured logs as required release signals.
  2. Use Promptfoo and targeted tests to cover retrieval and multilingual regressions.
  3. Define an end-to-end smoke path that touches scraping, ingestion, and chat where feasible.
  4. Record release-readiness evidence in docs and CI outputs rather than ad hoc notes only.

Risks

  • Verification can become expensive or flaky if it depends too heavily on live external services.
  • Metrics and logs can exist without answering the most important operational questions.
  • End-to-end checks may lag behind real production data quality.

Open questions

  • Which checks should block merge versus remain informational?
  • Should release evidence live only in CI artifacts, or also be summarized in docs for each release cycle?

Acceptance criteria

  • A plan doc exists for #32 under docs/plans/.
  • The doc defines observability and end-to-end verification as the release gate.
  • The doc names current metrics, logging, request ID, and Promptfoo touchpoints.
  • The plan remains backend-only.

Files likely to change

  • docs/plans/gh-0032-release-verification-observability.md
  • app/core/metrics.py
  • app/core/logging.py
  • app/core/middleware.py
  • app/api/routers/metrics.py
  • tests/routers/test_metrics.py
  • promptfoo/promptfooconfig.yaml
  • .github/workflows/promptfoo-rag-eval.yml
  • #32 - [Backend][Verify] Observability + end-to-end release verification

Status

Backfilled planning stub