Datalake
Local development
Run, debug, and test Datalake locally
Prerequisites
uvand Python 3.13.- GCP credentials with access to Pub/Sub and BigQuery.
- A Pub/Sub subscription emitting Spanner change stream events.
Required environment
export GCP_SPANNER_PROJECT_ID="..."
export GCP_BIGQUERY_PROJECT_ID="..."
export BQ_DATASET_ID="..."Optional environment
export LOOKBACK_HOURS=0
export MAX_MESSAGES=20
export MAX_BYTES=$((5 * 1024 * 1024))Google Application Credentials
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"Install dependencies
# From repo root
uv sync --all-packagesRun locally
# From repo root
uv run --package datalake granian --interface asgi services.datalake.main:app --host 0.0.0.0 --port 8080 --workers 1 --loop uvloopSanity checks
curl http://localhost:8080/api/v1/healthTests
# From repo root
uv run --package datalake pytest -qDebugging tips
- No messages: confirm the Pub/Sub subscription name and IAM permissions.
- BigQuery errors: confirm dataset and tables exist, and check permissions.
Last updated on