Open Table Formats Explained

An open table format is a specification that defines how a table stored in object storage tracks its own state. It answers the questions that a raw directory of Parquet files cannot: which files are currently part of this table, what is the schema, what changed in the last write, is this write complete?

The Four Main Formats

graph TD ICE["Apache Iceberg (Netflix, 2017, Apache Foundation): Broadest multi-engine support"] DL["Delta Lake (Databricks, 2019, Linux Foundation): Best Spark + Databricks integration"] HUDI["Apache Hudi (Uber, 2016, Apache Foundation): Native key-based upserts"] PAI["Apache Paimon (Alibaba/Flink, 2022, Apache Foundation): Flink-native LSM streaming"]

Apache Iceberg

Iceberg was created at Netflix to handle petabyte-scale tables where Hive-style folder semantics caused reliability problems. Its defining design is the immutable snapshot tree: every commit produces a new snapshot that references a complete manifest list. No log replay is needed to find current state. The REST Catalog specification is Iceberg's other key contribution — an open HTTP API any catalog can implement and any engine can call.

Delta Lake

Delta Lake takes a transaction-log approach. All commits are recorded as sequential JSON files in a _delta_log/ directory. Readers replay the log from the last Parquet checkpoint forward. Works very well within the Spark ecosystem. Delta's UniForm feature (Delta 3.x+) auto-generates Iceberg metadata alongside Delta metadata, allowing external Iceberg readers to access Delta tables read-only.

Apache Hudi

Hudi was created at Uber to efficiently update ride-sharing records by primary key without rewriting entire partitions. It maintains a timeline in .hoodie/ and builds per-record indexes (bloom filter, bucket, HBase) that let writers find the exact files containing a given record key. Hudi's native incremental query returns exactly which record keys changed between two points in time.

Apache Paimon

Paimon emerged from the Flink ecosystem. Its distinguishing feature is an LSM-tree architecture (the same structure that powers RocksDB), making it extremely efficient for streaming writes and key-based point lookups. Best format for sub-second streaming ingestion with efficient lookup joins.

How a Table Format Sits in Your Architecture

graph LR A["Data Sources (Databases, APIs, Kafka)"] --> B["Ingestion (Spark, Flink, Kafka Connect)"] B --> C["Open Table Format (Apache Iceberg / Delta / Hudi / Paimon)"] C --> D["Object Storage (S3, GCS, ADLS) — Parquet files"] C --> E["Catalog (Apache Polaris, Glue, Nessie, HMS)"] E --> F["Query Engines (Dremio, Trino, Spark, Athena, BigQuery)"] F --> G["Consumers (BI tools, AI agents, ML pipelines)"]

Shared Properties All Four Formats Provide

Property	What it means in practice
ACID transactions	Writers either fully commit or fully fail. No partial writes visible to readers.
Schema evolution	Add, rename, reorder, or drop columns without rewriting data files.
Time travel	Query any past version of the table by timestamp or version identifier.
Row-level deletes	DELETE and UPDATE statements that affect individual rows, not whole partitions.
Concurrent write safety	Multiple writers can operate without corrupting each other.

Key Differences

Dimension	Iceberg	Delta Lake	Hudi	Paimon
Metadata model	Immutable snapshot tree	Sequential transaction log	Timeline + record index	LSM-tree + manifest
Multi-engine writes	Excellent (REST Catalog)	Good (Spark-primary)	Good (Spark + Flink)	Good (Flink-primary)
Open catalog spec	Yes (REST Catalog)	No (Unity proprietary)	No	No
Key-based upserts	Via MoR (no native index)	Via MERGE INTO	Native (built-in index)	Native (LSM)
Python client	PyIceberg (mature)	delta-rs (Rust-based)	Limited	Limited
Cloud-managed service	S3 Tables, BigLake, Azure Fabric	Databricks (Unity Catalog)	None major	None yet