BacMR Backend Architecture
Status: Production-ready core, RAG integration complete Last Updated: 2026-02-22 Audience: Developers and operators
Overview
BacMR is a RAG-powered educational platform for Mauritanian curriculum (Baccalaureate). The backend is built with FastAPI, PostgreSQL (Supabase), Pinecone vector search, and OpenAI models orchestrated through LangGraph.
Core Philosophy: Production services exist in main branch. This document describes what's actually implemented and deployed, not planned features.
Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI | REST API, dependency injection, async support |
| Database | PostgreSQL (Supabase) | User data, chunks, billing, audit logs |
| Vector Store | Pinecone | Embedding search (1536-dim, cosine) |
| LLM Provider | OpenAI | Embeddings (text-embedding-3-small), chat (gpt-4o), reranking (gpt-4o-mini) |
| Agent Framework | LangGraph | Teacher agent orchestration |
| Auth | Supabase Auth | JWT + custom claims via Postgres hook |
| Cache | In-memory LRU | Rerank results (15min), chunk text (1hr) |
Architecture Overview
Client (Next.js)
↓ HTTPS + JWT
FastAPI Gateway (CORS, Auth)
↓
Service Layer (18 services)
├─ Auth & Wallet
├─ Retrieval Pipeline (RAG)
├─ Ingestion Pipeline
├─ Scraper & Quality
└─ Observability
↓
Data Stores
├─ Postgres (canonical chunks, users, billing)
├─ Pinecone (vectors + lightweight metadata)
└─ In-memory (cache)
Key Flows: 1. Chat: Reserve tokens → Retrieve (Pinecone) → Rerank (GPT-mini) → Answer (GPT-4o + LangGraph) → Finalize billing 2. Ingestion: Upload PDF → Parse → Chunk (deterministic IDs) → Embed → Upsert (Pinecone + Postgres) 3. Scraping: Fetch → Canonicalize → Dedupe (SimHash) → Quality check → Store
Service Layer (18 Services)
Authentication & Security
- Auth Service (
app/core/auth.py): JWT verification, role extraction fromapp_metadata.role(custom claims), admin guard - Request Middleware (
app/core/middleware.py): Request-ID propagation (UUID v4), rate limiting (per-user/per-IP)
Retrieval & RAG
- Retrieval Pipeline (
app/services/retrieval_pipeline.py): Dense search → rerank → fetch chunks. Fully integrated with chat router. - GPT-mini Service (
app/services/gpt_mini.py): Reranking, language detection, query translation - Pinecone Adapter (
app/services/pinecone_adapter.py): Vector upsert/query with lightweight metadata - Embedding Service (
app/services/embedding_service.py): Generate embeddings, track refs, upsert to Pinecone - Cache Service (
app/services/cache.py): Dual LRU (rerank 15-min TTL, chunks 1-hr TTL) - Teacher Agent (
app/agents/teacher_agent.py): LangGraph orchestration, wallet check → decide path → retrieve → answer
Billing & Wallet
- Wallet Reservation Service (
app/services/wallet_reservation.py): Reserve/finalize pattern, atomic transactions - Tier Config (
app/services/tier_config.py): Free/Standard/Premium limits (top-K, rerank-N)
Ingestion
- Ingestion Service (
app/services/ingestion.py): State machine (queued → parsing → embedding → ready/failed), retry logic - Chunking Service (
app/services/chunking.py): Deterministic chunk IDs (sha256(file_id:page:chunk_index)), token-based splitting - PDF Processor (
app/services/pdf_processor.py): Extract text from PDFs
Scraping & Quality
- Scraper Service (
app/services/scraper_service.py): Automated pipeline: canonicalize → dedupe → quality check → insert - Text Normalizer (
app/services/text_normalizer.py): Arabic canonicalization (alef unification, tatweel removal) - Deduplication (
app/services/deduplication.py): SimHash (64-bit) with Hamming distance ≤ 3 - Quality Checker (
app/services/quality_checker.py): Min length, OCR confidence, encoding validation
Features
- Quiz Generator (
app/services/quiz_generator.py): RAG-based quiz generation (service ready, router stub exists) - Upload Service (
app/services/upload.py): Presigned URL generation for S3/GCS
Observability
- Circuit Breaker (
app/services/circuit_breaker.py): Protect OpenAI/Pinecone (3 failures → open, 120s recovery) - Metrics (
app/core/metrics.py): Prometheus-compatible counters/histograms/gauges - Logging (
app/core/logging.py): Structured JSON logs with request-ID propagation
Router Organization
| Router | Endpoints | Status | Notes |
|---|---|---|---|
/auth |
signup, login | ✅ Working | Supabase delegation |
/me |
profile | ✅ Working | User profile |
/chat |
POST /chat | ✅ Working | SSE streaming, LangGraph agent |
/wallet |
balance, reserve, finalize | ✅ Working | Reservation pattern |
/admin |
users, roles | ✅ Working | JWT admin role required |
/metrics |
scrape, debug JSON, health | ✅ Working | /metrics is scrape path (no auth), /metrics/prometheus + /metrics/json are admin-only |
/quizzes |
generate | ⚠️ Stub | Service ready, router needs wiring |
/scraping |
sync | ⚠️ Partial | Service ready, needs full integration |
/curriculum |
references | ✅ Working | Curriculum metadata |
Not Included in API Router (remaining gaps):
- Quiz router exists but not wired to /quizzes/generate implementation
- No ingestion router (admin uploads via frontend, no direct API)
Database Schema (PostgreSQL)
Core Tables
- profiles: User profiles, hint_level, created_at
- wallet: token_balance, subscription_tier (free/standard/premium)
- wallet_ledger: delta, reason, request_id, reservation_id (audit trail)
- reservations: estimated, actual, status (reserved/finalized/expired), expires_at
Content Tables
- chunks: chunk_id (deterministic SHA256), file_id, page_number, content, token_count, language
- embedding_refs: chunk_id → pinecone_vector_id, namespace tracking
- documents: file metadata (uploaded PDFs)
- references: curriculum references, canonical_id (deduplication), content_fingerprint (SimHash)
Ingestion
- ingestion_jobs: status (queued/parsing/embedding/ready/failed), retry_count, chunks_created, vectors_upserted
- ingestion_audit: state transitions, error messages
Scraping
- scrape_runs: timestamp, source, status
RLS Status: Enabled on all public tables. Service role bypasses RLS. Custom JWT claims implemented via Postgres hook.
Authentication Flow
JWT + Custom Claims
- User logs in via Supabase Auth → receives JWT
- Postgres hook (
custom_access_token_hook) injectsapp_metadata.rolefromprofilestable into JWT - FastAPI verifies JWT, extracts role from
app_metadata.role - Role-based access control: student/teacher/admin
Current State:
- Custom claims hook implemented (migration 18)
- Auth service reads from app_metadata.role (with user_metadata.role fallback)
Roles:
- student: Default, access to chat/quiz/wallet
- teacher: Same as student (future: analytics access)
- admin: Full access to admin endpoints (user management, scraping, ingestion)
RAG Pipeline (Retrieval + LangGraph Agent)
Flow
User Question
↓
LangGraph Agent (teacher_agent.py)
├─ check_wallet: Reserve tokens
├─ decide_path: Simple answer or RAG needed?
└─ retrieve_context: If RAG needed
↓
Retrieval Pipeline
├─ detect_language (GPT-mini)
├─ translate_query (if needed)
├─ embed_query (OpenAI)
├─ dense_search (Pinecone top-K)
├─ rerank (GPT-mini top-N, cached)
└─ fetch_chunks (cache → Postgres)
↓
Teacher Agent
├─ generate_answer (GPT-4o, streaming)
└─ finalize (deduct actual tokens, log usage)
↓
SSE Stream to Client
Tier-Based Limits
| Tier | Dense Top-K | Rerank Top-N | Cache TTL |
|---|---|---|---|
| Free | 10 | 3 | 15 min |
| Standard | 20 | 5 | 15 min |
| Premium | 30 | 8 | 15 min |
What Goes Where
- Full chunk text: Postgres
chunks.content(source of truth) - Vectors: Pinecone (1536-dim, cosine similarity)
- Metadata in Pinecone: chunk_id, file_id, language, grade, subject, page_number, ingestion_ts (< 1 KB per vector)
- Cached rerank results: In-memory LRU (key:
sha256(query+namespace+tier)) - Cached chunk text: In-memory LRU (key:
chunk_id)
Wallet Reservation Pattern
Problem: LLM calls have unpredictable token usage. Pre-deducting max tokens wastes balance; post-deducting risks overdrafts.
Solution: Two-phase commit.
Reserve (Before LLM Call)
BEGIN;
UPDATE wallet SET token_balance = token_balance - :estimated WHERE user_id = :uid;
INSERT INTO reservations (user_id, estimated, status, expires_at)
VALUES (:uid, :estimated, 'reserved', now() + '5 minutes');
COMMIT;
Finalize (After LLM Call)
BEGIN;
UPDATE reservations SET actual = :actual, status = 'finalized' WHERE id = :res_id;
UPDATE wallet SET token_balance = token_balance + (:estimated - :actual) WHERE user_id = :uid;
INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
VALUES (:uid, -:actual, 'agent_chat', :res_id);
COMMIT;
Expiry (Background Job)
Every 60 seconds: Expire un-finalized reservations > 5 min old, refund tokens.
Reconciliation (Daily)
Compare wallet.token_balance with SUM(wallet_ledger.delta). Flag discrepancies (no auto-correct).
Chunking Strategy
Deterministic Chunk IDs
- Benefit: Re-ingesting same file produces identical chunk IDs → Pinecone upserts are idempotent.Token-Based Chunking
| Language | Chunk Size | Overlap | Tokenizer |
|---|---|---|---|
| French | 512 tokens | 64 tokens | tiktoken cl100k_base |
| Arabic (MSA) | 384 tokens | 48 tokens | tiktoken cl100k_base |
| Hassaniya | 384 tokens | 48 tokens | tiktoken cl100k_base |
Rationale: Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality.
Design Patterns
Circuit Breaker
- Protects: OpenAI API, Pinecone API
- Threshold: 3 failures in 60s → open circuit
- Recovery: 120s timeout, then half-open (1 test request)
- Fallback:
- Rerank failure → use dense-retrieval order
- Embedding failure → queue for retry
- Chat failure → return 503 to client
Caching
- Rerank Cache: Key =
sha256(query + namespace + tier), TTL = 15 min - Chunk Cache: Key =
chunk_id, TTL = 1 hr - Invalidation: On re-ingestion of any file in namespace
Request-ID Propagation
- Every request gets UUID v4
request_id - Propagated to:
- All log lines (structured JSON)
reservations.request_idwallet_ledger.request_id- OpenAI API calls (
userparameter) - SSE
doneevent - Error responses
Rate Limiting
| Scope | Limit | Window | Applies To |
|---|---|---|---|
| Per-user | 10 req | 1 min | /chat, /ask, /quizzes/generate |
| Per-user | 30 req | 1 min | /wallet/, /upload/ |
| Per-IP | 5 req | 1 min | /auth/signup, /auth/login |
| Per-user (admin) | 60 req | 1 min | /admin/, /ingestion/ |
Implementation: In-memory sliding window (single instance). For multi-instance: use Redis backend.
What's NOT Wired Up
Missing Integrations
- Quiz Router:
QuizGeneratorServiceexists, but/quizzes/generatereturns stub response - Ingestion Router: No API endpoint for admin PDF upload (frontend handles via presigned URLs)
- Scraper Admin: Scraper service exists but admin UI integration incomplete
Not Implemented
- Multi-instance deployment (rate limiter needs Redis)
- Blob storage integration (presigned URLs exist but not tested)
- Quiz persistence (quizzes generated but not saved to DB)
- Usage analytics dashboard
- Automated reindexing (script exists but no cron)
Background Jobs
| Job | File | Schedule | Status |
|---|---|---|---|
| Reservation Expiry | scripts/expire_reservations.py |
Continuous (60s loop) | Ready, needs deployment |
| Wallet Reconciliation | scripts/reconcile_wallets.py |
Daily 2 AM | Ready, needs cron setup |
| DR Export | scripts/export_chunks.py |
Weekly Sunday 3 AM | Ready, needs cron setup |
| Reindex | scripts/reindex.py |
On-demand | Ready for manual trigger |
Key Metrics
Available (Prometheus format + JSON):
- ingestion_job_duration_seconds (histogram)
- ingestion_job_status_total (counter)
- openai_request_duration_seconds (histogram)
- openai_tokens_used_total (counter)
- wallet_reservation_total (counter)
- circuit_breaker_state (gauge: 0=closed, 1=open)
- http_request_duration_seconds (histogram)
Collection: app/core/metrics.py (singleton MetricsCollector)
Export & Auth:
- GET /metrics: Prometheus scrape endpoint (no auth), intended for internal/allowlisted Prometheus pull access.
- GET /metrics/prometheus: Prometheus format with admin auth (manual/diagnostic access).
- GET /metrics/json: Structured JSON with admin auth (debug view).
Verification: Router wiring in app/api/router.py; endpoint behavior covered by tests/routers/test_metrics.py (public scrape + admin-only JSON/Prometheus views)
Security
Secrets Management
- Current:
.envfile for local development - Production: Use cloud secret manager (GCP/AWS/Azure) or Render environment groups
- Never commit: OPENAI_API_KEY, PINECONE_API_KEY, SUPABASE_SERVICE_ROLE_KEY
RLS (Row-Level Security)
- Enabled on: profiles, wallet, wallet_ledger, reservations, usage_logs
- System tables: Only service_role can access (chunks, ingestion_jobs, embedding_refs)
- Admin tables: Admin role + service_role (references, scrape_runs)
PII Redaction
- Strip emails/phone numbers before sending to OpenAI
- Don't send user_id or wallet balance to OpenAI
- Log redacted versions in usage_logs
Testing Strategy
Working Flows (Tested)
- Auth: signup → login → profile → JWT verification
- Wallet: balance → reserve → finalize → ledger entry
- Chat: reserve → retrieve → rerank → answer (SSE stream) → finalize
- Admin: list users → update role → verify custom claims
Integration Tests Needed
- Ingestion: upload → parse → chunk → embed → upsert
- Quiz: reserve → retrieve → generate → finalize
- Scraper: sync → canonicalize → dedupe → quality → insert
Postman Collection
- 40+ endpoints documented
- 10 testing workflows
- Auto-capture JWT, request-ID, reservation-ID
Dependency Injection Pattern
File: app/core/dependencies.py
All services instantiated as module-level singletons with proper dependency wiring:
# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(...)
# 2. Adapters
pinecone_adapter = PineconeAdapter(...)
cache_service = CacheService()
# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)
# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
openai_client, supabase_service, pinecone_adapter,
embedding_service, gpt_mini_service, cache_service
)
# 5. Features
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)
Usage in Routers:
from app.core.dependencies import wallet_service
@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
return wallet_service.get_balance(user["id"])
Migration Summary
Applied Migrations (main branch)
- 12: ingestion_jobs table
- 13: chunks enhanced (deterministic IDs)
- 14: reservations table
- 15: embedding_refs tracking
- 16: RLS for new tables
- 17: references enhancements (SimHash, canonical_id)
- 18: JWT custom claims hook
- 19: Update RLS for JWT claims
- 20: transactions table
- 21: documents enrichment
- 22: Fix RLS for service_role
Deployment Checklist
Environment Variables (Required)
OPENAI_API_KEY=sk-...
PINECONE_API_KEY=...
SUPABASE_URL=https://....supabase.co
SUPABASE_SERVICE_ROLE_KEY=...
PINECONE_INDEX_NAME=curriculum-1536
ENV=prod
CORS_ALLOWED_ORIGINS=https://app.bacmr.mr
Database Setup
- Run migrations 12-22 (see
db/migrations/) - Register custom claims hook in Supabase Dashboard (Auth → Hooks →
custom_access_token_hook) - Verify RLS policies on all tables
- Create first admin user manually:
UPDATE profiles SET role = 'admin' WHERE user_id = '...'
Background Jobs
- Deploy
scripts/expire_reservations.pyas systemd service (continuous) - Setup cron for
scripts/reconcile_wallets.py(daily 2 AM) - Setup cron for
scripts/export_chunks.py(weekly Sunday 3 AM)
Monitoring
- Setup log aggregation (structured JSON to CloudWatch/Datadog/etc.)
- Configure alerts:
- Circuit breaker opened
- Wallet discrepancy detected
- High ingestion failure rate
- Reservation expiry rate > 10%
Future Enhancements (Not Implemented)
These are architectural considerations for future work, not current features:
- Multi-region Pinecone deployment
- Redis-backed rate limiter for multi-instance deployments
- Quiz persistence and history
- Usage analytics dashboard
- Automated model reindexing pipeline
- A/B testing framework for prompt variations
- Multi-tenant isolation (currently single-tenant)
Reference Documentation
- Service Implementation:
app/services/(18 services, 3,300 LOC) - Router Implementation:
app/api/routers/(7 routers) - Agent Orchestration:
app/agents/teacher_agent.py(LangGraph) - Database Migrations:
db/migrations/20260217*.sql - Background Jobs:
scripts/expire_reservations.py,scripts/reconcile_wallets.py,scripts/export_chunks.py,scripts/reindex.py
Document Version: 2.0 (Concise Reference) Last Verified: 2026-02-22 against main branch