Skip to content

Data Model

Schema ownership

The current schema is defined by:

  • baseline SQL in db/bootstrap.sql
  • incremental changes in db/migrations/

For this branch, the effective schema is the baseline plus every migration in lexical order. db/bootstrap.sql alone is not sufficient to describe the current backend tables and columns.

Core relational areas

Identity and user context

  • profiles: role, full name, email, grade, major, tier, guidance, teacher metadata
  • parent_children: parent-owned child records used for parent retrieval context
  • custom JWT role claims are managed by migration hooks under db/migrations/20260217000018_* and later role taxonomy updates

Wallet and financial tracking

  • wallet: current token balance and subscription tier
  • wallet_ledger: token deltas and reasons
  • reservations: reserve-then-finalize billing records
  • transactions: money-side admin top-ups and expenses

Curriculum and retrieval

  • documents: one row per ingested curriculum file, with readiness and namespace metadata
  • chunks: canonical and original chunk text plus lexical fields
  • embedding_refs: mapping from chunk IDs to Pinecone vector IDs and namespaces
  • usage_logs: chat retrieval and answer usage logging

Scraping and ingestion operations

  • scrape_runs: execution history for source sync jobs
  • references: discovered source documents and ingestion status
  • ingestion_jobs: stateful job records for processing references
  • ingestion_audit: job transition audit trail
  • ingestion_handoffs: scrape-to-ingestion orchestration records

Search-specific data model

The current branch uses hybrid retrieval:

  • Pinecone stores semantic vectors keyed by deterministic chunk IDs
  • Postgres stores canonical chunk text used by the lexical search RPC
  • migration 20260301000031_hybrid_retrieval_lexical_rpc.sql defines search_chunks_lexical

Namespace convention is derived in code as:

  • grade-<grade>-<subject>

This namespace is written back to both documents and references.

Security and policy model

Row-level security is enabled through migrations, not the bootstrap file. Important policy-related migrations include:

  • phase 1 and phase 2 RLS enablement
  • service-role exceptions
  • admin access helpers
  • unified role taxonomy changes

When debugging access issues, read the latest role and RLS migrations before trusting historical summaries.