Nest Engineering Docs
Datalake

Local development

Run, debug, and test Datalake locally

Prerequisites

  • uv and Python 3.13.
  • GCP credentials with access to Pub/Sub and BigQuery.
  • A Pub/Sub subscription emitting Spanner change stream events.

Required environment

export GCP_SPANNER_PROJECT_ID="..."
export GCP_BIGQUERY_PROJECT_ID="..."
export BQ_DATASET_ID="..."

Optional environment

export LOOKBACK_HOURS=0
export MAX_MESSAGES=20
export MAX_BYTES=$((5 * 1024 * 1024))

Google Application Credentials

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Install dependencies

# From repo root
uv sync --all-packages

Run locally

# From repo root
uv run --package datalake granian --interface asgi services.datalake.main:app --host 0.0.0.0 --port 8080 --workers 1 --loop uvloop

Sanity checks

curl http://localhost:8080/api/v1/health

Tests

# From repo root
uv run --package datalake pytest -q

Debugging tips

  • No messages: confirm the Pub/Sub subscription name and IAM permissions.
  • BigQuery errors: confirm dataset and tables exist, and check permissions.

Last updated on