Stream
Architecture
System design and data flow for Stream
Stream is a Python 3.13 FastAPI service that turns Spanner Change Stream events into Cloud Tasks for downstream ETL processing.
System context
Spanner Change Streams -> Dataflow -> Pub/Sub (push) -> Stream (FastAPI)
|-> Spanner (pipeline metadata)
|-> Cloud Tasks -> Handler (internal)
|-> Secret Manager / SentryRequest flow
- Dataflow emits change stream events to Pub/Sub.
- Pub/Sub pushes the event to
POST /api/v1/pipeline/stream. - Stream validates the event and queries Spanner for pipeline metadata.
- It computes per-pipeline task delays and selects the correct queue based on PIMS.
- Cloud Tasks are created with idempotent names derived from commit timestamps.
Components
services/stream/main.py: app setup, Spanner lifecycle, task scheduler init.services/stream/api/v1/endpoints/pipeline.py: Pub/Sub push endpoint.services/stream/core/change_stream_processor.py: event parsing + TaskInfo creation.services/stream/core/cloud_tasks_scheduler.py: queue selection + task creation.packages/python/common/: Spanner client, Cloud Tasks helper, secrets.
Reliability and scaling
- Tasks are idempotent (commit timestamp + pipeline identifiers).
- Duplicate Cloud Tasks are detected and skipped.
- Scheduling retries use exponential backoff for transient errors.
Last updated on