Scrape

Architecture

Pipeline design and dependencies for Scrape

Pipeline stages

Load clinic and organization metadata from Spanner.
Skip clinics where Clinics.scrape is false.
Launch Playwright and log into EzyVet per clinic.
Download CSV reports (agenda, animal, contact, financial, receipts).
Transform CSVs into normalized DataFrames with Polars.
Upsert records into Spanner using batch mutations.

System context

Spanner (Clinics/Organizations) -> Scrape Job
                                      |-> EzyVet UI (CSV reports)
                                      |-> Secret Manager (credentials + proxy)
                                      |-> Cloud Spanner (upserts)
                                      |-> Sentry

Dependencies

Upstream: EzyVet UI reports; Spanner Clinics/Organizations.
Downstream: Cloud Spanner normalized tables.
External: Secret Manager; Sentry; proxy provider.

Failure handling

Playwright login and report downloads retry on transient failures.
Spanner writes use insert_or_update mutations (idempotent).
The job surfaces failures after all clinics attempt processing.

Last updated on

Overview

Python 3.13 EzyVet CSV scraper into Cloud Spanner

Inputs and outputs

Data contracts for Scrape

On this page

Pipeline stages

Failure handling