Apache Iceberg vs Delta Lake vs Apache Hudi
Apache Iceberg, Delta Lake, and Apache Hudi are the three formats that have become the standard choices for mutable analytical tables in object storage. Each one solves the same core problem (consistent, updatable tables on cheap storage) but with different design priorities that make them better fits for different workloads.
This comparison is vendor-neutral. The goal is to help you pick the right format based on what your workload actually needs, not based on which vendor talks loudest.
Origins and Governance
| Format | Created by | Year open-sourced | Governance | Primary design goal |
|---|---|---|---|---|
| Apache Iceberg | Netflix | 2018 | Apache Software Foundation | Multi-engine interoperability and open standards |
| Delta Lake | Databricks | 2019 | Linux Foundation Delta Lake | Reliable data lake on top of Spark |
| Apache Hudi | Uber | 2019 | Apache Software Foundation | High-frequency upserts and incremental processing |
How Each Format Tracks Table State
The transaction log or metadata model is the most fundamental architectural difference between the three formats.
Iceberg builds an immutable tree per snapshot. Each snapshot points to a manifest list that summarizes all the manifests, which in turn list the data files with per-column statistics. Readers always start from a complete, self-describing snapshot without replaying a log.
Delta Lake stores a sequential log of JSON commit files. To find the current table state, you either replay the entire log or start from the latest Parquet checkpoint and replay the commits since then. This is simpler to implement but creates more I/O overhead at high commit rates before checkpointing.
Hudi maintains a timeline in a hidden .hoodie/
directory that records every commit, clean, compaction, and rollback as timeline
actions. Hudi also stores per-record index metadata that Delta Lake and Iceberg
do not, which is what enables its efficient key-based upsert capability.
Feature Comparison
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
|---|---|---|---|
| Time travel | Yes (snapshot ID or timestamp) | Yes (version number or timestamp) | Yes (timeline-based) |
| Schema evolution | Full (column IDs, no rewrites) | Full | Full |
| Partition evolution | Yes (no rewrites) | Partial (rewrites needed for some changes) | Limited |
| Hidden partitioning | Yes | No | No |
| Row-level deletes | Yes (CoW + MoR, positional + equality) | Yes (deletion vectors in Delta 2.0+) | Yes (native, multiple strategies) |
| Branching and tagging | Yes (table-level branches and tags) | No (catalog-level via Unity only) | No |
| Record-level indexing | Bloom filters (Puffin) | Bloom filters, Z-order stats | Bloom filter, HBase, bucket, simple index |
| Open catalog standard | REST Catalog spec (open) | Unity Catalog API (proprietary) | HMS / REST (no open spec) |
| Credential vending | Yes (via Polaris, Nessie, Glue) | Via Unity Catalog (Databricks) | No standard mechanism |
Multi-Engine Support
Multi-engine support is where the formats diverge most clearly.
| Engine | Iceberg (read/write) | Delta Lake (read/write) | Hudi (read/write) |
|---|---|---|---|
| Apache Spark | Full | Full (best-in-class) | Full |
| Apache Flink | Full | Read + limited write | Full |
| Trino | Full | Read + write (connector) | Read (connector) |
| Dremio | Full (native) | Read (external table) | Limited |
| AWS Athena | Full | Full | Read |
| Google BigQuery | Full (BigLake) | No | No |
| Snowflake | Full (Iceberg tables + Open Catalog) | No | No |
| DuckDB | Read + partial write | No | No |
| PyIceberg | Full Python client | No equivalent | No equivalent |
Delta Lake's UniForm feature (available since Delta 3.x) auto-generates Iceberg metadata alongside Delta metadata, allowing external Iceberg readers to access Delta tables in read-only mode. This is Databricks acknowledging that Iceberg's ecosystem reach is broader.
Streaming and Incremental Processing
Flink sink (exactly-once)
Snapshot-diff incremental reads
Good for streaming + batch hybrid"] D["Delta Lake
Spark Structured Streaming
Delta Change Data Feed (CDF)
Best with Databricks DLT"] H["Apache Hudi
Native incremental query by key
Flink + Spark streaming
Best for key-based upsert pipelines"] end
Hudi's native incremental query is more precise than Iceberg's snapshot-diff approach when you need to know exactly which record keys changed between two points in time. Iceberg's snapshot diff tells you which files changed, which is sufficient for most use cases but less granular than Hudi's per-record change tracking.
Governance and Ecosystem
Apache Iceberg has the most open governance ecosystem. The Iceberg REST Catalog specification is a published standard that any catalog can implement. Apache Polaris (co-created by Dremio and Snowflake), Project Nessie, AWS Glue, and Snowflake Open Catalog all implement this standard. Any engine that supports the REST spec connects to any of these catalogs without vendor-specific code.
Delta Lake has Databricks Unity Catalog, which provides strong governance within the Databricks ecosystem. Unity is proprietary, so its governance capabilities are not available to other engines without going through Databricks.
Apache Hudi relies primarily on the Hive Metastore for catalog services and does not have an open catalog API equivalent to the Iceberg REST spec.
Decision Framework
OR open catalog governance
OR cloud-native (AWS/GCP/Azure)"| B["Apache Iceberg"] A -->|"All-in on Databricks + Spark
Unity Catalog for governance"| C["Delta Lake"] A -->|"High-frequency key-based upserts
Spark-primary streaming pipelines"| D["Apache Hudi"] B --> E["Check: does your engine support Iceberg natively?
(yes for Spark, Flink, Trino, Dremio, Athena, BigQuery, Snowflake, DuckDB)"] C --> F["Check: are you comfortable with Databricks as primary compute?"] D --> G["Check: do you need per-record incremental semantics
or record-key-based indexing?"]
| Your situation | Best format |
|---|---|
| New project, no existing vendor commitment | Apache Iceberg |
| All-in Databricks, using Unity Catalog | Delta Lake |
| Spark-based CDC pipeline with frequent key-based updates | Apache Hudi |
| AI agent analytics on enterprise data | Apache Iceberg (Dremio + Polaris) |
| Multi-cloud or multi-engine architecture | Apache Iceberg |
| AWS S3-native managed table service | Apache Iceberg (S3 Tables) |
| Google Cloud-native managed table service | Apache Iceberg (BigLake) |
| Existing Databricks + Delta tables, need Iceberg access | Delta Lake with UniForm (read-only external Iceberg) |
The Industry Direction in 2025
The clearest signal of where the market is going is that multiple companies have invested in Iceberg compatibility even when their primary format is different. Databricks shipped UniForm specifically because external engines need Iceberg access. Snowflake co-created Apache Polaris with Dremio. AWS launched S3 Tables as a native managed Iceberg service. Google launched BigLake Managed Tables on Iceberg. Every major cloud provider has bet on Iceberg as the interoperability layer.
That does not mean Delta Lake and Hudi are going away. Both have strong user bases and well-defined use cases. But for new projects where multi-engine access and open governance matter, Iceberg is the safer long-term choice.
Go Deeper
- Apache Iceberg Explained — the full Iceberg overview
- Iceberg vs Delta Lake (KB) — detailed technical comparison
- Iceberg vs Apache Hudi (KB) — detailed technical comparison
- Four-Format Comparison Including Paimon (KB)
- Iceberg REST Catalog and Apache Polaris