Datalake
Data model
Storage, schemas, and data ownership for Datalake
Data stores
- BigQuery: changelog tables in the datalake dataset.
- Pub/Sub: per-table topics for downstream consumers.
BigQuery schema
Tables are named <tableName>_changelog and partitioned by record_load_date.
Key fields include:
record_hash_key,data_hash_keykeys,payloadrecord_load_datemetadata_spanner_*fields (table name, commit timestamp, transaction ids)
Schema source of truth
- BigQuery table schema is defined in
services/datalake/utils/bigquery_utils.py. - Table creation is handled by data tooling or infrastructure; the service assumes tables already exist.
Retention and lifecycle
- No explicit TTL is enforced by the service.
- Dataset retention is managed in BigQuery configuration.
Last updated on