Nest Engineering Docs
Scrape

Architecture

Pipeline design and dependencies for Scrape

Pipeline stages

  1. Load clinic and organization metadata from Spanner.
  2. Skip clinics where Clinics.scrape is false.
  3. Launch Playwright and log into EzyVet per clinic.
  4. Download CSV reports (agenda, animal, contact, financial, receipts).
  5. Transform CSVs into normalized DataFrames with Polars.
  6. Upsert records into Spanner using batch mutations.

System context

Spanner (Clinics/Organizations) -> Scrape Job
                                      |-> EzyVet UI (CSV reports)
                                      |-> Secret Manager (credentials + proxy)
                                      |-> Cloud Spanner (upserts)
                                      |-> Sentry

Dependencies

  • Upstream: EzyVet UI reports; Spanner Clinics/Organizations.
  • Downstream: Cloud Spanner normalized tables.
  • External: Secret Manager; Sentry; proxy provider.

Failure handling

  • Playwright login and report downloads retry on transient failures.
  • Spanner writes use insert_or_update mutations (idempotent).
  • The job surfaces failures after all clinics attempt processing.

Last updated on