Cloud Data Platform v1→v2→v3

This example demonstrates a complete cloud data platform that evolves across three architectural layers. Use the layer pills to toggle between the MVP batch pipeline (v1), the streaming-enhanced architecture (v2), and the full production lakehouse (v3).

What each layer shows

v1 — MVP Batch Pipeline

The initial architecture ingests data nightly from operational databases using Airbyte, lands raw files in S3, and transforms them with dbt running on a scheduled Airflow DAG. Reports are served from a single Redshift cluster.

Ingestion: Airbyte pulls from Postgres and MySQL via JDBC
Storage: S3 (raw zone) with no partitioning strategy
Transform: dbt models run by Airflow on a 6-hour schedule
Serve: Redshift Serverless, Metabase dashboards

v2 — Streaming Layer Added

The v2 layer adds a real-time path alongside the existing batch pipeline. Debezium captures change events from Postgres WAL and publishes them to Kafka. Flink jobs read from Kafka and write directly to Iceberg tables on S3.

CDC: Debezium → Kafka (pulse edges)
Stream processing: Flink reads Kafka, writes Iceberg
Lakehouse format: Iceberg replaces raw S3 files for the real-time path
Batch still exists: Airbyte + dbt remain for historical backfill

The v2 Flink → Iceberg → Trino path enables sub-minute freshness for key metrics, while the batch path provides full historical accuracy with dbt test coverage.

v3 — Full Production Lakehouse

v3 activates the observability stack (OpenTelemetry, Prometheus, Grafana), adds ClickHouse for high-concurrency OLAP serving, and introduces a data catalog (Apache Atlas or DataHub) connected to all storage layers.

OLAP serving: ClickHouse for ad-hoc analytics at high concurrency
Catalog: Data catalog indexes S3, Iceberg, and Redshift metadata
Observability: OpenTelemetry collector → Prometheus → Grafana
Data quality: Great Expectations runs as part of every dbt invocation

Animation scenarios

v1 — Nightly Batch Run

Walks through a complete nightly batch cycle: Airflow triggers Airbyte to pull from sources, raw files land in S3, dbt transforms and tests the data, Redshift loads the final tables, Metabase queries refresh.

v2 — Real-Time Order Event

Shows a single order write in Postgres propagating through Debezium → Kafka → Flink → Iceberg → Trino within seconds, while the same event is also picked up by the notification service.

v3 — SLA Breach Alert

Demonstrates the observability path: a Flink job lag spike is detected by OpenTelemetry, routed to Prometheus, triggers a Grafana alert, and wakes an on-call engineer — all modeled as a packet flow through the monitoring stack.

Key design decisions

Why Debezium opens the connection to Postgres, not the other way around: Debezium is a Kafka Connect source connector running in the Connect worker cluster. It opens a replication slot connection to Postgres (connect('debezium', 'pg-primary')) — Postgres does not push WAL events to anyone.

Why Flink connects to Kafka, not Kafka to Flink: Kafka consumers always dial the broker. The broker never initiates connections to consumers. This is why all Kafka consumer edges point from the consumer service toward the Kafka topic nodes.

Why ClickHouse is a separate node from Redshift in v3: ClickHouse is added in v3 to handle high-concurrency OLAP workloads that Redshift Serverless does not serve cost-effectively at scale. Both coexist during the migration phase.

Running locally

To load this example in your local dev instance:

Open http://localhost:3000
Click Examples in the left sidebar
Select Cloud Data Platform from the Architecture Examples category
Click Run
Use the v1, v2, v3 pills in the top-right corner to toggle layers
Click Play in the AnimationPlayer to watch each scenario

Event Bus Pattern E-Commerce Microservices