Cloud Data Platform v1→v2→v3
This example demonstrates a complete cloud data platform that evolves across three architectural layers. Use the layer pills to toggle between the MVP batch pipeline (v1), the streaming-enhanced architecture (v2), and the full production lakehouse (v3).
What each layer shows
v1 — MVP Batch Pipeline
The initial architecture ingests data nightly from operational databases using Airbyte, lands raw files in S3, and transforms them with dbt running on a scheduled Airflow DAG. Reports are served from a single Redshift cluster.
- Ingestion: Airbyte pulls from Postgres and MySQL via JDBC
- Storage: S3 (raw zone) with no partitioning strategy
- Transform: dbt models run by Airflow on a 6-hour schedule
- Serve: Redshift Serverless, Metabase dashboards
v2 — Streaming Layer Added
The v2 layer adds a real-time path alongside the existing batch pipeline. Debezium captures change events from Postgres WAL and publishes them to Kafka. Flink jobs read from Kafka and write directly to Iceberg tables on S3.
- CDC: Debezium → Kafka (pulse edges)
- Stream processing: Flink reads Kafka, writes Iceberg
- Lakehouse format: Iceberg replaces raw S3 files for the real-time path
- Batch still exists: Airbyte + dbt remain for historical backfill
The v2 Flink → Iceberg → Trino path enables sub-minute freshness for key metrics, while the batch path provides full historical accuracy with dbt test coverage.
v3 — Full Production Lakehouse
v3 activates the observability stack (OpenTelemetry, Prometheus, Grafana), adds ClickHouse for high-concurrency OLAP serving, and introduces a data catalog (Apache Atlas or DataHub) connected to all storage layers.
- OLAP serving: ClickHouse for ad-hoc analytics at high concurrency
- Catalog: Data catalog indexes S3, Iceberg, and Redshift metadata
- Observability: OpenTelemetry collector → Prometheus → Grafana
- Data quality: Great Expectations runs as part of every dbt invocation
Animation scenarios
v1 — Nightly Batch Run
Walks through a complete nightly batch cycle: Airflow triggers Airbyte to pull from sources, raw files land in S3, dbt transforms and tests the data, Redshift loads the final tables, Metabase queries refresh.
v2 — Real-Time Order Event
Shows a single order write in Postgres propagating through Debezium → Kafka → Flink → Iceberg → Trino within seconds, while the same event is also picked up by the notification service.
v3 — SLA Breach Alert
Demonstrates the observability path: a Flink job lag spike is detected by OpenTelemetry, routed to Prometheus, triggers a Grafana alert, and wakes an on-call engineer — all modeled as a packet flow through the monitoring stack.
Key design decisions
Why Debezium opens the connection to Postgres, not the other way around: Debezium is a Kafka Connect source connector running in the Connect worker cluster. It opens a replication slot connection to Postgres (connect('debezium', 'pg-primary')) — Postgres does not push WAL events to anyone.
Why Flink connects to Kafka, not Kafka to Flink: Kafka consumers always dial the broker. The broker never initiates connections to consumers. This is why all Kafka consumer edges point from the consumer service toward the Kafka topic nodes.
Why ClickHouse is a separate node from Redshift in v3: ClickHouse is added in v3 to handle high-concurrency OLAP workloads that Redshift Serverless does not serve cost-effectively at scale. Both coexist during the migration phase.
Running locally
To load this example in your local dev instance:
- Open http://localhost:3000
- Click Examples in the left sidebar
- Select Cloud Data Platform from the Architecture Examples category
- Click Run
- Use the v1, v2, v3 pills in the top-right corner to toggle layers
- Click Play in the AnimationPlayer to watch each scenario