BacMR Backend Architecture

Status: Production-ready core, RAG integration complete Last Updated: 2026-02-22 Audience: Developers and operators

Overview

BacMR is a RAG-powered educational platform for Mauritanian curriculum (Baccalaureate). The backend is built with FastAPI, PostgreSQL (Supabase), Pinecone vector search, and OpenAI models orchestrated through LangGraph.

Core Philosophy: Production services exist in main branch. This document describes what's actually implemented and deployed, not planned features.

Tech Stack

Component	Technology	Purpose
API Framework	FastAPI	REST API, dependency injection, async support
Database	PostgreSQL (Supabase)	User data, chunks, billing, audit logs
Vector Store	Pinecone	Embedding search (1536-dim, cosine)
LLM Provider	OpenAI	Embeddings (text-embedding-3-small), chat (gpt-4o), reranking (gpt-4o-mini)
Agent Framework	LangGraph	Teacher agent orchestration
Auth	Supabase Auth	JWT + custom claims via Postgres hook
Cache	In-memory LRU	Rerank results (15min), chunk text (1hr)

Architecture Overview

Client (Next.js)
    ↓ HTTPS + JWT
FastAPI Gateway (CORS, Auth)
    ↓
Service Layer (18 services)
    ├─ Auth & Wallet
    ├─ Retrieval Pipeline (RAG)
    ├─ Ingestion Pipeline
    ├─ Scraper & Quality
    └─ Observability
    ↓
Data Stores
    ├─ Postgres (canonical chunks, users, billing)
    ├─ Pinecone (vectors + lightweight metadata)
    └─ In-memory (cache)

Key Flows: 1. Chat: Reserve tokens → Retrieve (Pinecone) → Rerank (GPT-mini) → Answer (GPT-4o + LangGraph) → Finalize billing 2. Ingestion: Upload PDF → Parse → Chunk (deterministic IDs) → Embed → Upsert (Pinecone + Postgres) 3. Scraping: Fetch → Canonicalize → Dedupe (SimHash) → Quality check → Store

Service Layer (18 Services)

Authentication & Security

Auth Service (app/core/auth.py): JWT verification, role extraction from app_metadata.role (custom claims), admin guard
Request Middleware (app/core/middleware.py): Request-ID propagation (UUID v4), rate limiting (per-user/per-IP)

Retrieval & RAG

Retrieval Pipeline (app/services/retrieval_pipeline.py): Dense search → rerank → fetch chunks. Fully integrated with chat router.
GPT-mini Service (app/services/gpt_mini.py): Reranking, language detection, query translation
Pinecone Adapter (app/services/pinecone_adapter.py): Vector upsert/query with lightweight metadata
Embedding Service (app/services/embedding_service.py): Generate embeddings, track refs, upsert to Pinecone
Cache Service (app/services/cache.py): Dual LRU (rerank 15-min TTL, chunks 1-hr TTL)
Teacher Agent (app/agents/teacher_agent.py): LangGraph orchestration, wallet check → decide path → retrieve → answer

Billing & Wallet

Wallet Reservation Service (app/services/wallet_reservation.py): Reserve/finalize pattern, atomic transactions
Tier Config (app/services/tier_config.py): Free/Standard/Premium limits (top-K, rerank-N)

Ingestion

Ingestion Service (app/services/ingestion.py): State machine (queued → parsing → embedding → ready/failed), retry logic
Chunking Service (app/services/chunking.py): Deterministic chunk IDs (sha256(file_id:page:chunk_index)), token-based splitting
PDF Processor (app/services/pdf_processor.py): Extract text from PDFs

Scraping & Quality

Scraper Service (app/services/scraper_service.py): Automated pipeline: canonicalize → dedupe → quality check → insert
Text Normalizer (app/services/text_normalizer.py): Arabic canonicalization (alef unification, tatweel removal)
Deduplication (app/services/deduplication.py): SimHash (64-bit) with Hamming distance ≤ 3
Quality Checker (app/services/quality_checker.py): Min length, OCR confidence, encoding validation

Features

Quiz Generator (app/services/quiz_generator.py): RAG-based quiz generation (service ready, router stub exists)
Upload Service (app/services/upload.py): Presigned URL generation for S3/GCS

Observability

Circuit Breaker (app/services/circuit_breaker.py): Protect OpenAI/Pinecone (3 failures → open, 120s recovery)
Metrics (app/core/metrics.py): Prometheus-compatible counters/histograms/gauges
Logging (app/core/logging.py): Structured JSON logs with request-ID propagation

Router Organization

Router	Endpoints	Status	Notes
`/auth`	signup, login	✅ Working	Supabase delegation
`/me`	profile	✅ Working	User profile
`/chat`	POST /chat	✅ Working	SSE streaming, LangGraph agent
`/wallet`	balance, reserve, finalize	✅ Working	Reservation pattern
`/admin`	users, roles	✅ Working	JWT admin role required
`/metrics`	scrape, debug JSON, health	✅ Working	`/metrics` is scrape path (no auth), `/metrics/prometheus` + `/metrics/json` are admin-only
`/quizzes`	generate	⚠️ Stub	Service ready, router needs wiring
`/scraping`	sync	⚠️ Partial	Service ready, needs full integration
`/curriculum`	references	✅ Working	Curriculum metadata

Not Included in API Router (remaining gaps): - Quiz router exists but not wired to /quizzes/generate implementation - No ingestion router (admin uploads via frontend, no direct API)

Database Schema (PostgreSQL)

Core Tables

profiles: User profiles, hint_level, created_at
wallet: token_balance, subscription_tier (free/standard/premium)
wallet_ledger: delta, reason, request_id, reservation_id (audit trail)
reservations: estimated, actual, status (reserved/finalized/expired), expires_at

Content Tables

chunks: chunk_id (deterministic SHA256), file_id, page_number, content, token_count, language
embedding_refs: chunk_id → pinecone_vector_id, namespace tracking
documents: file metadata (uploaded PDFs)
references: curriculum references, canonical_id (deduplication), content_fingerprint (SimHash)

Ingestion

ingestion_jobs: status (queued/parsing/embedding/ready/failed), retry_count, chunks_created, vectors_upserted
ingestion_audit: state transitions, error messages

Scraping

scrape_runs: timestamp, source, status

RLS Status: Enabled on all public tables. Service role bypasses RLS. Custom JWT claims implemented via Postgres hook.

Authentication Flow

JWT + Custom Claims

User logs in via Supabase Auth → receives JWT
Postgres hook (custom_access_token_hook) injects app_metadata.role from profiles table into JWT
FastAPI verifies JWT, extracts role from app_metadata.role
Role-based access control: student/teacher/admin

Current State: - Custom claims hook implemented (migration 18) - Auth service reads from app_metadata.role (with user_metadata.role fallback)

Roles: - student: Default, access to chat/quiz/wallet - teacher: Same as student (future: analytics access) - admin: Full access to admin endpoints (user management, scraping, ingestion)

RAG Pipeline (Retrieval + LangGraph Agent)

Flow

User Question
    ↓
LangGraph Agent (teacher_agent.py)
    ├─ check_wallet: Reserve tokens
    ├─ decide_path: Simple answer or RAG needed?
    └─ retrieve_context: If RAG needed
        ↓
Retrieval Pipeline
    ├─ detect_language (GPT-mini)
    ├─ translate_query (if needed)
    ├─ embed_query (OpenAI)
    ├─ dense_search (Pinecone top-K)
    ├─ rerank (GPT-mini top-N, cached)
    └─ fetch_chunks (cache → Postgres)
        ↓
Teacher Agent
    ├─ generate_answer (GPT-4o, streaming)
    └─ finalize (deduct actual tokens, log usage)
        ↓
SSE Stream to Client

Tier-Based Limits

Tier	Dense Top-K	Rerank Top-N	Cache TTL
Free	10	3	15 min
Standard	20	5	15 min
Premium	30	8	15 min

What Goes Where

Full chunk text: Postgres chunks.content (source of truth)
Vectors: Pinecone (1536-dim, cosine similarity)
Metadata in Pinecone: chunk_id, file_id, language, grade, subject, page_number, ingestion_ts (< 1 KB per vector)
Cached rerank results: In-memory LRU (key: sha256(query+namespace+tier))
Cached chunk text: In-memory LRU (key: chunk_id)

Wallet Reservation Pattern

Problem: LLM calls have unpredictable token usage. Pre-deducting max tokens wastes balance; post-deducting risks overdrafts.

Solution: Two-phase commit.

Reserve (Before LLM Call)

BEGIN;
    UPDATE wallet SET token_balance = token_balance - :estimated WHERE user_id = :uid;
    INSERT INTO reservations (user_id, estimated, status, expires_at)
        VALUES (:uid, :estimated, 'reserved', now() + '5 minutes');
COMMIT;

Finalize (After LLM Call)

BEGIN;
    UPDATE reservations SET actual = :actual, status = 'finalized' WHERE id = :res_id;
    UPDATE wallet SET token_balance = token_balance + (:estimated - :actual) WHERE user_id = :uid;
    INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
        VALUES (:uid, -:actual, 'agent_chat', :res_id);
COMMIT;

Expiry (Background Job)

Every 60 seconds: Expire un-finalized reservations > 5 min old, refund tokens.

Reconciliation (Daily)

Compare wallet.token_balance with SUM(wallet_ledger.delta). Flag discrepancies (no auto-correct).

Chunking Strategy

Deterministic Chunk IDs

chunk_id = sha256(file_id + ":" + page_number + ":" + chunk_index)

- Benefit: Re-ingesting same file produces identical chunk IDs → Pinecone upserts are idempotent.

Token-Based Chunking

Language	Chunk Size	Overlap	Tokenizer
French	512 tokens	64 tokens	tiktoken cl100k_base
Arabic (MSA)	384 tokens	48 tokens	tiktoken cl100k_base
Hassaniya	384 tokens	48 tokens	tiktoken cl100k_base

Rationale: Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality.

Design Patterns

Circuit Breaker

Protects: OpenAI API, Pinecone API
Threshold: 3 failures in 60s → open circuit
Recovery: 120s timeout, then half-open (1 test request)
Fallback:
Rerank failure → use dense-retrieval order
Embedding failure → queue for retry
Chat failure → return 503 to client

Caching

Rerank Cache: Key = sha256(query + namespace + tier), TTL = 15 min
Chunk Cache: Key = chunk_id, TTL = 1 hr
Invalidation: On re-ingestion of any file in namespace

Request-ID Propagation

Every request gets UUID v4 request_id
Propagated to:
All log lines (structured JSON)
reservations.request_id
wallet_ledger.request_id
OpenAI API calls (user parameter)
SSE done event
Error responses

Rate Limiting

Scope	Limit	Window	Applies To
Per-user	10 req	1 min	/chat, /ask, /quizzes/generate
Per-user	30 req	1 min	/wallet/, /upload/
Per-IP	5 req	1 min	/auth/signup, /auth/login
Per-user (admin)	60 req	1 min	/admin/, /ingestion/

Implementation: In-memory sliding window (single instance). For multi-instance: use Redis backend.

What's NOT Wired Up

Missing Integrations

Quiz Router: QuizGeneratorService exists, but /quizzes/generate returns stub response
Ingestion Router: No API endpoint for admin PDF upload (frontend handles via presigned URLs)
Scraper Admin: Scraper service exists but admin UI integration incomplete

Not Implemented

Multi-instance deployment (rate limiter needs Redis)
Blob storage integration (presigned URLs exist but not tested)
Quiz persistence (quizzes generated but not saved to DB)
Usage analytics dashboard
Automated reindexing (script exists but no cron)

Background Jobs

Job	File	Schedule	Status
Reservation Expiry	`scripts/expire_reservations.py`	Continuous (60s loop)	Ready, needs deployment
Wallet Reconciliation	`scripts/reconcile_wallets.py`	Daily 2 AM	Ready, needs cron setup
DR Export	`scripts/export_chunks.py`	Weekly Sunday 3 AM	Ready, needs cron setup
Reindex	`scripts/reindex.py`	On-demand	Ready for manual trigger

Key Metrics

Available (Prometheus format + JSON): - ingestion_job_duration_seconds (histogram) - ingestion_job_status_total (counter) - openai_request_duration_seconds (histogram) - openai_tokens_used_total (counter) - wallet_reservation_total (counter) - circuit_breaker_state (gauge: 0=closed, 1=open) - http_request_duration_seconds (histogram)

Collection: app/core/metrics.py (singleton MetricsCollector) Export & Auth: - GET /metrics: Prometheus scrape endpoint (no auth), intended for internal/allowlisted Prometheus pull access. - GET /metrics/prometheus: Prometheus format with admin auth (manual/diagnostic access). - GET /metrics/json: Structured JSON with admin auth (debug view). Verification: Router wiring in app/api/router.py; endpoint behavior covered by tests/routers/test_metrics.py (public scrape + admin-only JSON/Prometheus views)

Security

Secrets Management

Current: .env file for local development
Production: Use cloud secret manager (GCP/AWS/Azure) or Render environment groups
Never commit: OPENAI_API_KEY, PINECONE_API_KEY, SUPABASE_SERVICE_ROLE_KEY

RLS (Row-Level Security)

Enabled on: profiles, wallet, wallet_ledger, reservations, usage_logs
System tables: Only service_role can access (chunks, ingestion_jobs, embedding_refs)
Admin tables: Admin role + service_role (references, scrape_runs)

PII Redaction

Strip emails/phone numbers before sending to OpenAI
Don't send user_id or wallet balance to OpenAI
Log redacted versions in usage_logs

Testing Strategy

Working Flows (Tested)

Auth: signup → login → profile → JWT verification
Wallet: balance → reserve → finalize → ledger entry
Chat: reserve → retrieve → rerank → answer (SSE stream) → finalize
Admin: list users → update role → verify custom claims

Integration Tests Needed

Ingestion: upload → parse → chunk → embed → upsert
Quiz: reserve → retrieve → generate → finalize
Scraper: sync → canonicalize → dedupe → quality → insert

Postman Collection

40+ endpoints documented
10 testing workflows
Auto-capture JWT, request-ID, reservation-ID

Dependency Injection Pattern

File: app/core/dependencies.py

All services instantiated as module-level singletons with proper dependency wiring:

# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(...)

# 2. Adapters
pinecone_adapter = PineconeAdapter(...)
cache_service = CacheService()

# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)

# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
    openai_client, supabase_service, pinecone_adapter,
    embedding_service, gpt_mini_service, cache_service
)

# 5. Features
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)

Usage in Routers:

from app.core.dependencies import wallet_service

@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
    return wallet_service.get_balance(user["id"])

Migration Summary

Applied Migrations (main branch)

12: ingestion_jobs table
13: chunks enhanced (deterministic IDs)
14: reservations table
15: embedding_refs tracking
16: RLS for new tables
17: references enhancements (SimHash, canonical_id)
18: JWT custom claims hook
19: Update RLS for JWT claims
20: transactions table
21: documents enrichment
22: Fix RLS for service_role

Deployment Checklist

Environment Variables (Required)

OPENAI_API_KEY=sk-...
PINECONE_API_KEY=...
SUPABASE_URL=https://....supabase.co
SUPABASE_SERVICE_ROLE_KEY=...
PINECONE_INDEX_NAME=curriculum-1536
ENV=prod
CORS_ALLOWED_ORIGINS=https://app.bacmr.mr

Database Setup

Run migrations 12-22 (see db/migrations/)
Register custom claims hook in Supabase Dashboard (Auth → Hooks → custom_access_token_hook)
Verify RLS policies on all tables
Create first admin user manually: UPDATE profiles SET role = 'admin' WHERE user_id = '...'

Background Jobs

Deploy scripts/expire_reservations.py as systemd service (continuous)
Setup cron for scripts/reconcile_wallets.py (daily 2 AM)
Setup cron for scripts/export_chunks.py (weekly Sunday 3 AM)

Monitoring

Setup log aggregation (structured JSON to CloudWatch/Datadog/etc.)
Configure alerts:
Circuit breaker opened
Wallet discrepancy detected
High ingestion failure rate
Reservation expiry rate > 10%

Future Enhancements (Not Implemented)

These are architectural considerations for future work, not current features:

Multi-region Pinecone deployment
Redis-backed rate limiter for multi-instance deployments
Quiz persistence and history
Usage analytics dashboard
Automated model reindexing pipeline
A/B testing framework for prompt variations
Multi-tenant isolation (currently single-tenant)

Reference Documentation

Service Implementation: app/services/ (18 services, 3,300 LOC)
Router Implementation: app/api/routers/ (7 routers)
Agent Orchestration: app/agents/teacher_agent.py (LangGraph)
Database Migrations: db/migrations/20260217*.sql
Background Jobs: scripts/expire_reservations.py, scripts/reconcile_wallets.py, scripts/export_chunks.py, scripts/reindex.py

Document Version: 2.0 (Concise Reference) Last Verified: 2026-02-22 against main branch