Apache Iceberg vs Delta Lake

Apache Iceberg and Delta Lake are the two dominant open table formats for cloud lakehouses: Iceberg offers superior multi-engine neutrality and the broadest ecosystem support, while Delta Lake is tightly integrated with Databricks and has strong tooling within the Spark ecosystem.

Apache Iceberg and Delta Lake are the two dominant open table formats for cloud data lakehouses, and comparing them is one of the most common questions in the modern data engineering world. The right choice depends on your existing ecosystem, governance requirements, and how important multi-engine portability is to your architecture.

Origins and Governance

Aspect	Apache Iceberg	Delta Lake
Created by	Netflix (2017)	Databricks (2019)
Governance	Apache Software Foundation	Linux Foundation Delta Lake Project
Open source	Yes (Apache 2.0)	Yes (Apache 2.0)
Specification	Open, community-governed spec	Controlled primarily by Databricks
Vendor neutrality	Fully vendor-neutral	Databricks-first (UniForm for interop)

Core Architecture Differences

Transaction Log

Delta Lake: Stores a transaction log as JSON files (_delta_log/) with sequential JSON commit files (00000.json, 00001.json, …) and periodic Parquet checkpoint files. The transaction log is a flat sequence of operations.

Apache Iceberg: Stores metadata as a hierarchy: metadata JSON → manifest list → manifest files → data files. Each snapshot is a complete, self-describing metadata tree with rich per-file statistics.

Iceberg’s approach enables:

Richer file statistics: Per-file min/max, null counts stored in manifests (usable for skipping without reading data files).
Faster metadata operations: Manifest-level partition elimination before reading any data.
Cleaner snapshot semantics: Each snapshot is a complete, addressable state: no need to “replay” a log to find current state.

Delta Lake’s log approach is simpler to implement but can accumulate many JSON files in high-commit scenarios (mitigated by checkpointing).

Catalog and Multi-Engine Support

Apache Iceberg: Built around a fully abstracted, engine-neutral catalog API. Any catalog (HMS, Glue, Nessie, Polaris, JDBC) implements the catalog interface. Multiple engines connect via the Iceberg REST Catalog specification: a published, open API standard.

Delta Lake: Natively integrated with the Databricks Unity Catalog. Multi-engine support added retroactively via UniForm (Delta tables automatically generate Iceberg metadata), but external engines are read-only on UniForm tables.

Row-Level Deletes

Apache Iceberg: First-class, spec-level support for row-level deletes via:

Positional delete files (Spec v2).
Equality delete files (Spec v2).
Deletion vectors (Spec v3: compact, bitmap-based).

Delta Lake: Row-level deletes via deletion vectors (Delta 2.0+) stored as compact bitmaps. Functionally similar to Iceberg’s deletion vector approach but implemented in the Delta transaction log.

Feature Comparison

Feature	Apache Iceberg	Delta Lake
Time travel	Yes (snapshot-based)	Yes (version-based)
Schema evolution	Full (column IDs)	Full
Partition evolution	Yes (metadata-only)	Partial (requires rewrites)
Hidden partitioning	Yes	No
Multi-engine reads	Excellent (REST Catalog)	Good (UniForm: read-only)
Multi-engine writes	Excellent	Limited (Databricks primary writer)
Branching and tagging	Yes (table-level)	No (catalog-level via Unity)
Open catalog standard	REST Catalog spec	Unity Catalog API (proprietary)
Streaming support	Yes (Flink, Spark Streaming)	Yes (Spark Streaming, DLT)
DML (UPDATE/DELETE/MERGE)	Full (Spec v2)	Full
Governance	Apache Foundation	Linux Foundation

Ecosystem Support

Apache Iceberg is supported natively by every major query engine:

Spark, Flink, Trino, Presto, Dremio, Athena, BigQuery, Snowflake, DuckDB, PyIceberg, StarRocks, Doris, Hive 4.x, and more.
AWS, GCP, and Azure all have native managed Iceberg services (S3 Tables, BigLake, Fabric).

Delta Lake is supported by:

Spark (best-in-class), Databricks Runtime (best-in-class).
Trino, Flink, Redshift, BigQuery, Athena, and others via the Delta connector.
Databricks Unity Catalog as primary governance.

Delta’s ecosystem is narrower than Iceberg’s, particularly outside the Spark/Databricks world.

When to Choose Apache Iceberg

You need true multi-engine writes (Spark + Flink + Dremio + Trino, all as first-class writers).
You require open catalog governance (Apache Polaris, Nessie, Glue: all vendor-neutral).
You are building on AWS, GCP, or Azure without Databricks as the center.
You need AI agent and MCP integration via Dremio’s Agentic Lakehouse.
Governance and compliance via Apache Foundation ownership.

When Delta Lake May Be Preferred

You are all-in on Databricks and want the tightest platform integration.
Your primary compute is Spark via Databricks Runtime.
You rely on Databricks-specific features (Delta Live Tables, Photon, DBSQL).
You want Unity Catalog as your governance platform within the Databricks world.

The Industry Trajectory

Since 2023, the industry has clearly consolidated on Apache Iceberg as the multi-engine interoperability standard. Evidence:

Snowflake co-created Apache Polaris with Dremio and launched Snowflake Open Catalog.
AWS launched S3 Tables (native managed Iceberg).
Google launched BigLake Managed Tables (native Iceberg).
Databricks itself added UniForm (Delta → Iceberg compatibility): acknowledging Iceberg’s ecosystem dominance.

Even Delta Lake’s vendor (Databricks) has invested in Iceberg compatibility, which signals where the ecosystem is heading.

Apache Iceberg vs Delta Lake