Scrape
Overview
Python 3.13 EzyVet CSV scraper into Cloud Spanner
Scrape is a Python 3.13 Cloud Run Job that logs into EzyVet via Playwright, downloads CSV reports, transforms them with Polars, and upserts normalized records into Cloud Spanner. It only runs for clinics flagged for scraping.
Job profile
| Field | Value |
|---|---|
| Code | jobs/scrape/ |
| Package | scrape |
| Runtime | Python 3.13 (Cloud Run Job) |
| Status | Legacy (maintained) |
| Primary owner | Joe Pardi |
| Secondary owner | Akansh Divker |
| Trigger | Cloud Run Job (manual or scheduler) |
| Data source | EzyVet UI reports + Spanner metadata |
| Data sink | Cloud Spanner |
Purpose
- Pull CSV reports from EzyVet for scrape-enabled clinics.
- Normalize report data into Nest's Spanner schema.
- Provide a legacy ingestion path for clinics without API-based ingestion.
Non-goals
- Onboarding new orgs or clinics (scrape is not expanding).
- Real-time ingestion or incremental API syncing.
- Replacing the Handler ETL pipelines.
Lifecycle notes
- Scrape is intended to be phased out.
- Migration to Handler's ezyVet pipelines is non-trivial, so the job will be maintained until migration is complete.
Inputs and outputs
- Inputs: Spanner Clinics/Organizations, EzyVet credentials, EzyVet CSV exports.
- Outputs: Spanner tables (households, contacts, patients, appointments, invoices, invoice lines, team members).
Related pages
Last updated on