Skip to content

Ingestion Handoff Orchestration (Issue #26)

What changed

When POST /scraping/{source}/sync persists references, backend now immediately runs a reference_id handoff orchestration step:

  1. Load references touched by the current scrape_run_id.
  2. Create/update an ingestion_handoffs record per (reference_id, payload_hash).
  3. Queue ingestion via ingestion_jobs using reference_id.
  4. Track handoff lifecycle + reason codes for observability.

Lifecycle

ingestion_handoffs.status uses:

  • queued
  • running
  • completed
  • failed

reason_code captures why the transition happened (examples: queued_from_scrape, ingestion_job_queued, ingestion_job_already_active, retry_exhausted).

Retry strategy

  • Max attempts: 3 (configurable at service init).
  • Retry only on transient errors (timeouts, connection/rate-limit/503 patterns).
  • Backoff is exponential (base * 2^(attempt-1)); default base is 0 in current implementation to keep sync calls fast.
  • On max-attempt exhaustion, handoff is marked failed with reason_code=retry_exhausted.

Idempotency / dedupe

  • DB-level dedupe: UNIQUE(reference_id, payload_hash) on ingestion_handoffs.
  • Runtime dedupe: if latest ingestion job for a reference is already active/ready, no duplicate job payload is inserted.

Queryability

  • GET /scraping/{source}/handoffs lists handoff records.
  • Supports filters: status, scrape_run_id, pagination (limit, offset).

Sync response additions

POST /scraping/{source}/sync now includes:

  • handoff_queued_count
  • handoff_completed_count
  • handoff_failed_count
  • handoff_skipped_count

Rollback plan (migration 029)

If rollback is required, treat this migration as data-destructive for handoff history only (does not delete references or ingestion_jobs rows).

DROP TRIGGER IF EXISTS trg_update_ingestion_handoff_timestamp ON ingestion_handoffs;
DROP TABLE IF EXISTS ingestion_handoffs;
DROP FUNCTION IF EXISTS update_ingestion_handoff_timestamp();

Post-rollback expectation: - /scraping/{source}/sync still persists references, but handoff orchestration/query endpoint (/handoffs) is unavailable until migration 029 is re-applied.