Nest Engineering Docs
Stream

Architecture

System design and data flow for Stream

Stream is a Python 3.13 FastAPI service that turns Spanner Change Stream events into Cloud Tasks for downstream ETL processing.

System context

Spanner Change Streams -> Dataflow -> Pub/Sub (push) -> Stream (FastAPI)
                                                  |-> Spanner (pipeline metadata)
                                                  |-> Cloud Tasks -> Handler (internal)
                                                  |-> Secret Manager / Sentry

Request flow

  1. Dataflow emits change stream events to Pub/Sub.
  2. Pub/Sub pushes the event to POST /api/v1/pipeline/stream.
  3. Stream validates the event and queries Spanner for pipeline metadata.
  4. It computes per-pipeline task delays and selects the correct queue based on PIMS.
  5. Cloud Tasks are created with idempotent names derived from commit timestamps.

Components

  • services/stream/main.py: app setup, Spanner lifecycle, task scheduler init.
  • services/stream/api/v1/endpoints/pipeline.py: Pub/Sub push endpoint.
  • services/stream/core/change_stream_processor.py: event parsing + TaskInfo creation.
  • services/stream/core/cloud_tasks_scheduler.py: queue selection + task creation.
  • packages/python/common/: Spanner client, Cloud Tasks helper, secrets.

Reliability and scaling

  • Tasks are idempotent (commit timestamp + pipeline identifiers).
  • Duplicate Cloud Tasks are detected and skipped.
  • Scheduling retries use exponential backoff for transient errors.

Last updated on