Open Table Formats Explained
An open table format is a specification that defines how a table stored in object storage tracks its own state. It answers the questions that a raw directory of Parquet files cannot: which files are currently part of this table, what is the schema, what changed in the last write, is this write complete?
The Four Main Formats
Apache Iceberg
Iceberg was created at Netflix to handle petabyte-scale tables where Hive-style folder semantics caused reliability problems. Its defining design is the immutable snapshot tree: every commit produces a new snapshot that references a complete manifest list. No log replay is needed to find current state. The REST Catalog specification is Iceberg's other key contribution โ an open HTTP API any catalog can implement and any engine can call.
Delta Lake
Delta Lake takes a transaction-log approach. All commits are recorded as
sequential JSON files in a _delta_log/ directory. Readers replay
the log from the last Parquet checkpoint forward. Works very well within the
Spark ecosystem. Delta's UniForm feature (Delta 3.x+) auto-generates Iceberg
metadata alongside Delta metadata, allowing external Iceberg readers to access
Delta tables read-only.
Apache Hudi
Hudi was created at Uber to efficiently update ride-sharing records by
primary key without rewriting entire partitions. It maintains a timeline
in .hoodie/
and builds per-record indexes (bloom filter, bucket, HBase) that let writers
find the exact files containing a given record key. Hudi's native incremental
query returns exactly which record keys changed between two points in time.
Apache Paimon
Paimon emerged from the Flink ecosystem. Its distinguishing feature is an LSM-tree architecture (the same structure that powers RocksDB), making it extremely efficient for streaming writes and key-based point lookups. Best format for sub-second streaming ingestion with efficient lookup joins.
How a Table Format Sits in Your Architecture
Shared Properties All Four Formats Provide
| Property | What it means in practice |
|---|---|
| ACID transactions | Writers either fully commit or fully fail. No partial writes visible to readers. |
| Schema evolution | Add, rename, reorder, or drop columns without rewriting data files. |
| Time travel | Query any past version of the table by timestamp or version identifier. |
| Row-level deletes | DELETE and UPDATE statements that affect individual rows, not whole partitions. |
| Concurrent write safety | Multiple writers can operate without corrupting each other. |
Key Differences
| Dimension | Iceberg | Delta Lake | Hudi | Paimon |
|---|---|---|---|---|
| Metadata model | Immutable snapshot tree | Sequential transaction log | Timeline + record index | LSM-tree + manifest |
| Multi-engine writes | Excellent (REST Catalog) | Good (Spark-primary) | Good (Spark + Flink) | Good (Flink-primary) |
| Open catalog spec | Yes (REST Catalog) | No (Unity proprietary) | No | No |
| Key-based upserts | Via MoR (no native index) | Via MERGE INTO | Native (built-in index) | Native (LSM) |
| Python client | PyIceberg (mature) | delta-rs (Rust-based) | Limited | Limited |
| Cloud-managed service | S3 Tables, BigLake, Azure Fabric | Databricks (Unity Catalog) | None major | None yet |