Scrape
Schedule and triggers
Scheduling, triggers, and concurrency for Scrape
Schedule
- Runs as a Cloud Run Job named
scrape. - Schedule is external (Cloud Scheduler or manual trigger).
- Expected runtime varies by clinic count; Cloud Run timeout is 2 hours.
- No new orgs or clinics should be added to scrape.
Triggers
- Manual execution via Cloud Run Jobs.
- Optional scheduler-based triggers (not defined in this repo).
Concurrency and idempotency
- Job-level parallelism is 1; clinic processing is limited by a semaphore (5).
- Spanner writes use insert_or_update mutations for idempotency.
Last updated on