Skip to content

Open Table Formats Explained

An open table format is a specification that defines how a table stored in object storage tracks its own state. It answers the questions that a raw directory of Parquet files cannot: which files are currently part of this table, what is the schema, what changed in the last write, is this write complete?

The Four Main Formats

graph TD ICE["Apache Iceberg (Netflix, 2017, Apache Foundation): Broadest multi-engine support"] DL["Delta Lake (Databricks, 2019, Linux Foundation): Best Spark + Databricks integration"] HUDI["Apache Hudi (Uber, 2016, Apache Foundation): Native key-based upserts"] PAI["Apache Paimon (Alibaba/Flink, 2022, Apache Foundation): Flink-native LSM streaming"]

Apache Iceberg

Iceberg was created at Netflix to handle petabyte-scale tables where Hive-style folder semantics caused reliability problems. Its defining design is the immutable snapshot tree: every commit produces a new snapshot that references a complete manifest list. No log replay is needed to find current state. The REST Catalog specification is Iceberg's other key contribution โ€” an open HTTP API any catalog can implement and any engine can call.

Delta Lake

Delta Lake takes a transaction-log approach. All commits are recorded as sequential JSON files in a _delta_log/ directory. Readers replay the log from the last Parquet checkpoint forward. Works very well within the Spark ecosystem. Delta's UniForm feature (Delta 3.x+) auto-generates Iceberg metadata alongside Delta metadata, allowing external Iceberg readers to access Delta tables read-only.

Apache Hudi

Hudi was created at Uber to efficiently update ride-sharing records by primary key without rewriting entire partitions. It maintains a timeline in .hoodie/ and builds per-record indexes (bloom filter, bucket, HBase) that let writers find the exact files containing a given record key. Hudi's native incremental query returns exactly which record keys changed between two points in time.

Apache Paimon

Paimon emerged from the Flink ecosystem. Its distinguishing feature is an LSM-tree architecture (the same structure that powers RocksDB), making it extremely efficient for streaming writes and key-based point lookups. Best format for sub-second streaming ingestion with efficient lookup joins.

How a Table Format Sits in Your Architecture

graph LR A["Data Sources (Databases, APIs, Kafka)"] --> B["Ingestion (Spark, Flink, Kafka Connect)"] B --> C["Open Table Format (Apache Iceberg / Delta / Hudi / Paimon)"] C --> D["Object Storage (S3, GCS, ADLS) โ€” Parquet files"] C --> E["Catalog (Apache Polaris, Glue, Nessie, HMS)"] E --> F["Query Engines (Dremio, Trino, Spark, Athena, BigQuery)"] F --> G["Consumers (BI tools, AI agents, ML pipelines)"]

Shared Properties All Four Formats Provide

PropertyWhat it means in practice
ACID transactionsWriters either fully commit or fully fail. No partial writes visible to readers.
Schema evolutionAdd, rename, reorder, or drop columns without rewriting data files.
Time travelQuery any past version of the table by timestamp or version identifier.
Row-level deletesDELETE and UPDATE statements that affect individual rows, not whole partitions.
Concurrent write safetyMultiple writers can operate without corrupting each other.

Key Differences

DimensionIcebergDelta LakeHudiPaimon
Metadata modelImmutable snapshot treeSequential transaction logTimeline + record indexLSM-tree + manifest
Multi-engine writesExcellent (REST Catalog)Good (Spark-primary)Good (Spark + Flink)Good (Flink-primary)
Open catalog specYes (REST Catalog)No (Unity proprietary)NoNo
Key-based upsertsVia MoR (no native index)Via MERGE INTONative (built-in index)Native (LSM)
Python clientPyIceberg (mature)delta-rs (Rust-based)LimitedLimited
Cloud-managed serviceS3 Tables, BigLake, Azure FabricDatabricks (Unity Catalog)None majorNone yet

Go Deeper

๐Ÿ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.