Data Model
Schema ownership
The current schema is defined by:
- baseline SQL in
db/bootstrap.sql - incremental changes in
db/migrations/
For this branch, the effective schema is the baseline plus every migration in lexical order. db/bootstrap.sql alone is not sufficient to describe the current backend tables and columns.
Core relational areas
Identity and user context
profiles: role, full name, email, grade, major, tier, guidance, teacher metadataparent_children: parent-owned child records used for parent retrieval context- custom JWT role claims are managed by migration hooks under
db/migrations/20260217000018_*and later role taxonomy updates
Wallet and financial tracking
wallet: current token balance and subscription tierwallet_ledger: token deltas and reasonsreservations: reserve-then-finalize billing recordstransactions: money-side admin top-ups and expenses
Curriculum and retrieval
documents: one row per ingested curriculum file, with readiness and namespace metadatachunks: canonical and original chunk text plus lexical fieldsembedding_refs: mapping from chunk IDs to Pinecone vector IDs and namespacesusage_logs: chat retrieval and answer usage logging
Scraping and ingestion operations
scrape_runs: execution history for source sync jobsreferences: discovered source documents and ingestion statusingestion_jobs: stateful job records for processing referencesingestion_audit: job transition audit trailingestion_handoffs: scrape-to-ingestion orchestration records
Search-specific data model
The current branch uses hybrid retrieval:
- Pinecone stores semantic vectors keyed by deterministic chunk IDs
- Postgres stores canonical chunk text used by the lexical search RPC
- migration
20260301000031_hybrid_retrieval_lexical_rpc.sqldefinessearch_chunks_lexical
Namespace convention is derived in code as:
grade-<grade>-<subject>
This namespace is written back to both documents and references.
Security and policy model
Row-level security is enabled through migrations, not the bootstrap file. Important policy-related migrations include:
- phase 1 and phase 2 RLS enablement
- service-role exceptions
- admin access helpers
- unified role taxonomy changes
When debugging access issues, read the latest role and RLS migrations before trusting historical summaries.