Skip to content
Modern Lakehouse Concepts & Interoperability Last updated: May 29, 2026

CDC Log Ingestion Pipelines

Data pipelines that capture database transaction logs and apply those insert, update, and delete events to lakehouse tables in real time.

cdc log ingestiontransaction log cdcreal time ingestionchange data capture pipeline

CDC Log Ingestion Pipelines

CDC Log Ingestion Pipelines (Change Data Capture) are real-time ingestion streams that replicate changes from source transactional databases (such as PostgreSQL, MySQL, or Oracle) to a data lakehouse. Rather than running periodic SQL query pulls that scan the source database (which increases production query load), CDC pipelines read the database’s internal transaction log directly.

The Pipeline Architecture

A typical CDC ingestion pipeline consists of several components:

  1. Source Transaction Log: The database log (e.g., PostgreSQL WAL or MySQL binlog) that records all DML events.
  2. CDC Capture Engine: A service (like Debezium) that reads the log, parses the events, and formats them into standardized JSON or Avro messages.
  3. Event Stream Message Bus: A streaming broker (like Apache Kafka or Redpanda) that buffers the change messages.
  4. Ingestion Writer Engine: A streaming compute client (like Apache Flink or Spark Structured Streaming) that reads messages from the bus and writes them to the lakehouse:
  Source DB ──> Transaction Log ──> CDC Engine ──> Kafka ──> Ingest Engine ──> Iceberg

Writing CDC Events to Iceberg

Ingestion engines write CDC events to Apache Iceberg tables using one of two strategies:

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base