Nest Engineering Docs
Datalake

Data model

Storage, schemas, and data ownership for Datalake

Data stores

  • BigQuery: changelog tables in the datalake dataset.
  • Pub/Sub: per-table topics for downstream consumers.

BigQuery schema

Tables are named <tableName>_changelog and partitioned by record_load_date. Key fields include:

  • record_hash_key, data_hash_key
  • keys, payload
  • record_load_date
  • metadata_spanner_* fields (table name, commit timestamp, transaction ids)

Schema source of truth

  • BigQuery table schema is defined in services/datalake/utils/bigquery_utils.py.
  • Table creation is handled by data tooling or infrastructure; the service assumes tables already exist.

Retention and lifecycle

  • No explicit TTL is enforced by the service.
  • Dataset retention is managed in BigQuery configuration.

Last updated on