Skip to content

BacMR Backend Architecture

Status: Production-ready core, RAG integration complete Last Updated: 2026-02-22 Audience: Developers and operators


Overview

BacMR is a RAG-powered educational platform for Mauritanian curriculum (Baccalaureate). The backend is built with FastAPI, PostgreSQL (Supabase), Pinecone vector search, and OpenAI models orchestrated through LangGraph.

Core Philosophy: Production services exist in main branch. This document describes what's actually implemented and deployed, not planned features.


Tech Stack

Component Technology Purpose
API Framework FastAPI REST API, dependency injection, async support
Database PostgreSQL (Supabase) User data, chunks, billing, audit logs
Vector Store Pinecone Embedding search (1536-dim, cosine)
LLM Provider OpenAI Embeddings (text-embedding-3-small), chat (gpt-4o), reranking (gpt-4o-mini)
Agent Framework LangGraph Teacher agent orchestration
Auth Supabase Auth JWT + custom claims via Postgres hook
Cache In-memory LRU Rerank results (15min), chunk text (1hr)

Architecture Overview

Client (Next.js)
    ↓ HTTPS + JWT
FastAPI Gateway (CORS, Auth)
Service Layer (18 services)
    ├─ Auth & Wallet
    ├─ Retrieval Pipeline (RAG)
    ├─ Ingestion Pipeline
    ├─ Scraper & Quality
    └─ Observability
Data Stores
    ├─ Postgres (canonical chunks, users, billing)
    ├─ Pinecone (vectors + lightweight metadata)
    └─ In-memory (cache)

Key Flows: 1. Chat: Reserve tokens → Retrieve (Pinecone) → Rerank (GPT-mini) → Answer (GPT-4o + LangGraph) → Finalize billing 2. Ingestion: Upload PDF → Parse → Chunk (deterministic IDs) → Embed → Upsert (Pinecone + Postgres) 3. Scraping: Fetch → Canonicalize → Dedupe (SimHash) → Quality check → Store


Service Layer (18 Services)

Authentication & Security

  • Auth Service (app/core/auth.py): JWT verification, role extraction from app_metadata.role (custom claims), admin guard
  • Request Middleware (app/core/middleware.py): Request-ID propagation (UUID v4), rate limiting (per-user/per-IP)

Retrieval & RAG

  • Retrieval Pipeline (app/services/retrieval_pipeline.py): Dense search → rerank → fetch chunks. Fully integrated with chat router.
  • GPT-mini Service (app/services/gpt_mini.py): Reranking, language detection, query translation
  • Pinecone Adapter (app/services/pinecone_adapter.py): Vector upsert/query with lightweight metadata
  • Embedding Service (app/services/embedding_service.py): Generate embeddings, track refs, upsert to Pinecone
  • Cache Service (app/services/cache.py): Dual LRU (rerank 15-min TTL, chunks 1-hr TTL)
  • Teacher Agent (app/agents/teacher_agent.py): LangGraph orchestration, wallet check → decide path → retrieve → answer

Billing & Wallet

  • Wallet Reservation Service (app/services/wallet_reservation.py): Reserve/finalize pattern, atomic transactions
  • Tier Config (app/services/tier_config.py): Free/Standard/Premium limits (top-K, rerank-N)

Ingestion

  • Ingestion Service (app/services/ingestion.py): State machine (queued → parsing → embedding → ready/failed), retry logic
  • Chunking Service (app/services/chunking.py): Deterministic chunk IDs (sha256(file_id:page:chunk_index)), token-based splitting
  • PDF Processor (app/services/pdf_processor.py): Extract text from PDFs

Scraping & Quality

  • Scraper Service (app/services/scraper_service.py): Automated pipeline: canonicalize → dedupe → quality check → insert
  • Text Normalizer (app/services/text_normalizer.py): Arabic canonicalization (alef unification, tatweel removal)
  • Deduplication (app/services/deduplication.py): SimHash (64-bit) with Hamming distance ≤ 3
  • Quality Checker (app/services/quality_checker.py): Min length, OCR confidence, encoding validation

Features

  • Quiz Generator (app/services/quiz_generator.py): RAG-based quiz generation (service ready, router stub exists)
  • Upload Service (app/services/upload.py): Presigned URL generation for S3/GCS

Observability

  • Circuit Breaker (app/services/circuit_breaker.py): Protect OpenAI/Pinecone (3 failures → open, 120s recovery)
  • Metrics (app/core/metrics.py): Prometheus-compatible counters/histograms/gauges
  • Logging (app/core/logging.py): Structured JSON logs with request-ID propagation

Router Organization

Router Endpoints Status Notes
/auth signup, login ✅ Working Supabase delegation
/me profile ✅ Working User profile
/chat POST /chat ✅ Working SSE streaming, LangGraph agent
/wallet balance, reserve, finalize ✅ Working Reservation pattern
/admin users, roles ✅ Working JWT admin role required
/metrics scrape, debug JSON, health ✅ Working /metrics is scrape path (no auth), /metrics/prometheus + /metrics/json are admin-only
/quizzes generate ⚠️ Stub Service ready, router needs wiring
/scraping sync ⚠️ Partial Service ready, needs full integration
/curriculum references ✅ Working Curriculum metadata

Not Included in API Router (remaining gaps): - Quiz router exists but not wired to /quizzes/generate implementation - No ingestion router (admin uploads via frontend, no direct API)


Database Schema (PostgreSQL)

Core Tables

  • profiles: User profiles, hint_level, created_at
  • wallet: token_balance, subscription_tier (free/standard/premium)
  • wallet_ledger: delta, reason, request_id, reservation_id (audit trail)
  • reservations: estimated, actual, status (reserved/finalized/expired), expires_at

Content Tables

  • chunks: chunk_id (deterministic SHA256), file_id, page_number, content, token_count, language
  • embedding_refs: chunk_id → pinecone_vector_id, namespace tracking
  • documents: file metadata (uploaded PDFs)
  • references: curriculum references, canonical_id (deduplication), content_fingerprint (SimHash)

Ingestion

  • ingestion_jobs: status (queued/parsing/embedding/ready/failed), retry_count, chunks_created, vectors_upserted
  • ingestion_audit: state transitions, error messages

Scraping

  • scrape_runs: timestamp, source, status

RLS Status: Enabled on all public tables. Service role bypasses RLS. Custom JWT claims implemented via Postgres hook.


Authentication Flow

JWT + Custom Claims

  1. User logs in via Supabase Auth → receives JWT
  2. Postgres hook (custom_access_token_hook) injects app_metadata.role from profiles table into JWT
  3. FastAPI verifies JWT, extracts role from app_metadata.role
  4. Role-based access control: student/teacher/admin

Current State: - Custom claims hook implemented (migration 18) - Auth service reads from app_metadata.role (with user_metadata.role fallback)

Roles: - student: Default, access to chat/quiz/wallet - teacher: Same as student (future: analytics access) - admin: Full access to admin endpoints (user management, scraping, ingestion)


RAG Pipeline (Retrieval + LangGraph Agent)

Flow

User Question
LangGraph Agent (teacher_agent.py)
    ├─ check_wallet: Reserve tokens
    ├─ decide_path: Simple answer or RAG needed?
    └─ retrieve_context: If RAG needed
Retrieval Pipeline
    ├─ detect_language (GPT-mini)
    ├─ translate_query (if needed)
    ├─ embed_query (OpenAI)
    ├─ dense_search (Pinecone top-K)
    ├─ rerank (GPT-mini top-N, cached)
    └─ fetch_chunks (cache → Postgres)
Teacher Agent
    ├─ generate_answer (GPT-4o, streaming)
    └─ finalize (deduct actual tokens, log usage)
SSE Stream to Client

Tier-Based Limits

Tier Dense Top-K Rerank Top-N Cache TTL
Free 10 3 15 min
Standard 20 5 15 min
Premium 30 8 15 min

What Goes Where

  • Full chunk text: Postgres chunks.content (source of truth)
  • Vectors: Pinecone (1536-dim, cosine similarity)
  • Metadata in Pinecone: chunk_id, file_id, language, grade, subject, page_number, ingestion_ts (< 1 KB per vector)
  • Cached rerank results: In-memory LRU (key: sha256(query+namespace+tier))
  • Cached chunk text: In-memory LRU (key: chunk_id)

Wallet Reservation Pattern

Problem: LLM calls have unpredictable token usage. Pre-deducting max tokens wastes balance; post-deducting risks overdrafts.

Solution: Two-phase commit.

Reserve (Before LLM Call)

BEGIN;
    UPDATE wallet SET token_balance = token_balance - :estimated WHERE user_id = :uid;
    INSERT INTO reservations (user_id, estimated, status, expires_at)
        VALUES (:uid, :estimated, 'reserved', now() + '5 minutes');
COMMIT;

Finalize (After LLM Call)

BEGIN;
    UPDATE reservations SET actual = :actual, status = 'finalized' WHERE id = :res_id;
    UPDATE wallet SET token_balance = token_balance + (:estimated - :actual) WHERE user_id = :uid;
    INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
        VALUES (:uid, -:actual, 'agent_chat', :res_id);
COMMIT;

Expiry (Background Job)

Every 60 seconds: Expire un-finalized reservations > 5 min old, refund tokens.

Reconciliation (Daily)

Compare wallet.token_balance with SUM(wallet_ledger.delta). Flag discrepancies (no auto-correct).


Chunking Strategy

Deterministic Chunk IDs

chunk_id = sha256(file_id + ":" + page_number + ":" + chunk_index)
- Benefit: Re-ingesting same file produces identical chunk IDs → Pinecone upserts are idempotent.

Token-Based Chunking

Language Chunk Size Overlap Tokenizer
French 512 tokens 64 tokens tiktoken cl100k_base
Arabic (MSA) 384 tokens 48 tokens tiktoken cl100k_base
Hassaniya 384 tokens 48 tokens tiktoken cl100k_base

Rationale: Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality.


Design Patterns

Circuit Breaker

  • Protects: OpenAI API, Pinecone API
  • Threshold: 3 failures in 60s → open circuit
  • Recovery: 120s timeout, then half-open (1 test request)
  • Fallback:
  • Rerank failure → use dense-retrieval order
  • Embedding failure → queue for retry
  • Chat failure → return 503 to client

Caching

  • Rerank Cache: Key = sha256(query + namespace + tier), TTL = 15 min
  • Chunk Cache: Key = chunk_id, TTL = 1 hr
  • Invalidation: On re-ingestion of any file in namespace

Request-ID Propagation

  • Every request gets UUID v4 request_id
  • Propagated to:
  • All log lines (structured JSON)
  • reservations.request_id
  • wallet_ledger.request_id
  • OpenAI API calls (user parameter)
  • SSE done event
  • Error responses

Rate Limiting

Scope Limit Window Applies To
Per-user 10 req 1 min /chat, /ask, /quizzes/generate
Per-user 30 req 1 min /wallet/, /upload/
Per-IP 5 req 1 min /auth/signup, /auth/login
Per-user (admin) 60 req 1 min /admin/, /ingestion/

Implementation: In-memory sliding window (single instance). For multi-instance: use Redis backend.


What's NOT Wired Up

Missing Integrations

  1. Quiz Router: QuizGeneratorService exists, but /quizzes/generate returns stub response
  2. Ingestion Router: No API endpoint for admin PDF upload (frontend handles via presigned URLs)
  3. Scraper Admin: Scraper service exists but admin UI integration incomplete

Not Implemented

  • Multi-instance deployment (rate limiter needs Redis)
  • Blob storage integration (presigned URLs exist but not tested)
  • Quiz persistence (quizzes generated but not saved to DB)
  • Usage analytics dashboard
  • Automated reindexing (script exists but no cron)

Background Jobs

Job File Schedule Status
Reservation Expiry scripts/expire_reservations.py Continuous (60s loop) Ready, needs deployment
Wallet Reconciliation scripts/reconcile_wallets.py Daily 2 AM Ready, needs cron setup
DR Export scripts/export_chunks.py Weekly Sunday 3 AM Ready, needs cron setup
Reindex scripts/reindex.py On-demand Ready for manual trigger

Key Metrics

Available (Prometheus format + JSON): - ingestion_job_duration_seconds (histogram) - ingestion_job_status_total (counter) - openai_request_duration_seconds (histogram) - openai_tokens_used_total (counter) - wallet_reservation_total (counter) - circuit_breaker_state (gauge: 0=closed, 1=open) - http_request_duration_seconds (histogram)

Collection: app/core/metrics.py (singleton MetricsCollector) Export & Auth: - GET /metrics: Prometheus scrape endpoint (no auth), intended for internal/allowlisted Prometheus pull access. - GET /metrics/prometheus: Prometheus format with admin auth (manual/diagnostic access). - GET /metrics/json: Structured JSON with admin auth (debug view). Verification: Router wiring in app/api/router.py; endpoint behavior covered by tests/routers/test_metrics.py (public scrape + admin-only JSON/Prometheus views)


Security

Secrets Management

  • Current: .env file for local development
  • Production: Use cloud secret manager (GCP/AWS/Azure) or Render environment groups
  • Never commit: OPENAI_API_KEY, PINECONE_API_KEY, SUPABASE_SERVICE_ROLE_KEY

RLS (Row-Level Security)

  • Enabled on: profiles, wallet, wallet_ledger, reservations, usage_logs
  • System tables: Only service_role can access (chunks, ingestion_jobs, embedding_refs)
  • Admin tables: Admin role + service_role (references, scrape_runs)

PII Redaction

  • Strip emails/phone numbers before sending to OpenAI
  • Don't send user_id or wallet balance to OpenAI
  • Log redacted versions in usage_logs

Testing Strategy

Working Flows (Tested)

  • Auth: signup → login → profile → JWT verification
  • Wallet: balance → reserve → finalize → ledger entry
  • Chat: reserve → retrieve → rerank → answer (SSE stream) → finalize
  • Admin: list users → update role → verify custom claims

Integration Tests Needed

  • Ingestion: upload → parse → chunk → embed → upsert
  • Quiz: reserve → retrieve → generate → finalize
  • Scraper: sync → canonicalize → dedupe → quality → insert

Postman Collection

  • 40+ endpoints documented
  • 10 testing workflows
  • Auto-capture JWT, request-ID, reservation-ID

Dependency Injection Pattern

File: app/core/dependencies.py

All services instantiated as module-level singletons with proper dependency wiring:

# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(...)

# 2. Adapters
pinecone_adapter = PineconeAdapter(...)
cache_service = CacheService()

# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)

# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
    openai_client, supabase_service, pinecone_adapter,
    embedding_service, gpt_mini_service, cache_service
)

# 5. Features
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)

Usage in Routers:

from app.core.dependencies import wallet_service

@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
    return wallet_service.get_balance(user["id"])


Migration Summary

Applied Migrations (main branch)

  • 12: ingestion_jobs table
  • 13: chunks enhanced (deterministic IDs)
  • 14: reservations table
  • 15: embedding_refs tracking
  • 16: RLS for new tables
  • 17: references enhancements (SimHash, canonical_id)
  • 18: JWT custom claims hook
  • 19: Update RLS for JWT claims
  • 20: transactions table
  • 21: documents enrichment
  • 22: Fix RLS for service_role

Deployment Checklist

Environment Variables (Required)

OPENAI_API_KEY=sk-...
PINECONE_API_KEY=...
SUPABASE_URL=https://....supabase.co
SUPABASE_SERVICE_ROLE_KEY=...
PINECONE_INDEX_NAME=curriculum-1536
ENV=prod
CORS_ALLOWED_ORIGINS=https://app.bacmr.mr

Database Setup

  1. Run migrations 12-22 (see db/migrations/)
  2. Register custom claims hook in Supabase Dashboard (Auth → Hooks → custom_access_token_hook)
  3. Verify RLS policies on all tables
  4. Create first admin user manually: UPDATE profiles SET role = 'admin' WHERE user_id = '...'

Background Jobs

  1. Deploy scripts/expire_reservations.py as systemd service (continuous)
  2. Setup cron for scripts/reconcile_wallets.py (daily 2 AM)
  3. Setup cron for scripts/export_chunks.py (weekly Sunday 3 AM)

Monitoring

  1. Setup log aggregation (structured JSON to CloudWatch/Datadog/etc.)
  2. Configure alerts:
  3. Circuit breaker opened
  4. Wallet discrepancy detected
  5. High ingestion failure rate
  6. Reservation expiry rate > 10%

Future Enhancements (Not Implemented)

These are architectural considerations for future work, not current features:

  • Multi-region Pinecone deployment
  • Redis-backed rate limiter for multi-instance deployments
  • Quiz persistence and history
  • Usage analytics dashboard
  • Automated model reindexing pipeline
  • A/B testing framework for prompt variations
  • Multi-tenant isolation (currently single-tenant)

Reference Documentation

  • Service Implementation: app/services/ (18 services, 3,300 LOC)
  • Router Implementation: app/api/routers/ (7 routers)
  • Agent Orchestration: app/agents/teacher_agent.py (LangGraph)
  • Database Migrations: db/migrations/20260217*.sql
  • Background Jobs: scripts/expire_reservations.py, scripts/reconcile_wallets.py, scripts/export_chunks.py, scripts/reindex.py

Document Version: 2.0 (Concise Reference) Last Verified: 2026-02-22 against main branch