Scrape

Schedule and triggers

Scheduling, triggers, and concurrency for Scrape

Schedule

Runs as a Cloud Run Job named scrape.
Schedule is external (Cloud Scheduler or manual trigger).
Expected runtime varies by clinic count; Cloud Run timeout is 2 hours.
No new orgs or clinics should be added to scrape.

Triggers

Manual execution via Cloud Run Jobs.
Optional scheduler-based triggers (not defined in this repo).

Concurrency and idempotency

Job-level parallelism is 1; clinic processing is limited by a semaphore (5).
Spanner writes use insert_or_update mutations for idempotency.

Last updated on

Inputs and outputs

Data contracts for Scrape

Local development

Run, debug, and test Scrape locally

On this page

Concurrency and idempotency