The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake

Get Data Lakehouse Books:

Join the Data Lakehouse Community Data Lakehouse Blog Roll

Modern lakehouse stacks live or die by how they manage tables on cheap, scalable object storage. That “how” is the job of open table formats, the layer that turns piles of Parquet/ORC files into reliable, ACID-compliant tables with schema evolution, time travel, and efficient query planning. If you’ve ever wrestled with brittle Hive tables, small-file explosions, or “append-only” lakes that can’t handle updates and deletes, you already know why this layer matters.

In this guide, we’ll demystify the five formats you’re most likely to encounter:

Apache Iceberg - snapshot- and manifest–driven, engine-agnostic, fast for large-scale analytics.
Delta Lake - transaction-log–based, deeply integrated with Spark/Databricks, strong batch/stream unification.
Apache Hudi - built for upserts, deletes, and incremental processing; flexible COW/MOR modes.
Apache Paimon - streaming-first with an LSM-like design for high-velocity updates and near-real-time reads.
DuckLake - a fresh, catalog-centric approach that uses a relational database for metadata (SQL all the way down).

We’ll start beginner-friendly, clarifying what a table format is and why it’s essential, then progressively dive into expert-level topics: metadata internals (snapshots, logs, manifests, LSM levels), row-level change strategies (COW, MOR, delete vectors), performance trade-offs, ecosystem support (Spark, Flink, Trino/Presto, DuckDB, warehouses), and adoption trends you should factor into your roadmap.

By the end, you’ll have a practical mental model to choose the right format for your workloads, whether you’re optimizing petabyte-scale analytics, enabling near-real-time CDC, or simplifying your metadata layer for developer velocity.

Why Open Table Formats Exist

Before diving into each format, it’s worth understanding why open table formats became necessary in the first place.

Traditional data lakes, built on raw files like CSV, JSON, or Parquet, were cheap and scalable, but brittle. They had no concept of transactions, which meant if two jobs wrote data at the same time, you could easily end up with partial or corrupted results. Schema evolution was painful, renaming or reordering columns could break queries, and updating or deleting even a single row often meant rewriting entire partitions.

Meanwhile, enterprises still needed database-like features, updates, deletes, versioning, auditing, on their data lakes. That tension set the stage for open table formats. These formats layer metadata and transaction protocols on top of files to give the data lake the brains of a database while keeping its open, flexible nature.

In practice, open table formats deliver several critical capabilities:

ACID Transactions: Ensure reliability for concurrent reads and writes.
Schema Evolution: Add, drop, or rename fields without breaking downstream consumers.
Time Travel: Query data as it existed at a specific point in time for auditing or recovery.
Efficient Queries: Push down filters and prune partitions/files using metadata rather than scanning everything.
Row-Level Mutations: Support upserts, merges, and deletes on immutable storage layers.
Multi-Engine Interoperability: Enable the same table to be queried by Spark, Flink, Trino, Presto, DuckDB, warehouses, and more.

In other words, table formats solve the “wild west of files” problem, turning data lakes into lakehouses that balance scalability with structure. The differences among Iceberg, Delta, Hudi, Paimon, and DuckLake lie in how they achieve this and what trade-offs they make to optimize for batch, streaming, or simplicity.

Next, we’ll walk through the history and evolution of each format to see how these ideas took shape.

The Evolution of Open Table Formats

The journey of open table formats reflects the challenges companies faced as data lakes scaled from terabytes to petabytes. Each format emerged to solve specific pain points:

Apache Hudi (2016) – Created at Uber to solve freshness and incremental ingestion. Hudi pioneered row-level upserts and deletes on data lakes, enabling near real-time pipelines on Hadoop-sized datasets.
Delta Lake (2017–2018) – Developed by Databricks to unify batch and streaming in Spark. Its transaction log design (_delta_log) gave data lakes database-like commits and time-travel capabilities, making it a cornerstone of the “lakehouse” concept.
Apache Iceberg (2018) – Born at Netflix to overcome Hive’s scalability and schema evolution limitations. Its snapshot/manifest-based metadata model provided atomic commits, partition evolution, and reliable time-travel at massive scale, quickly becoming an industry favorite.
Apache Paimon (2022) – Emerging from Alibaba’s Flink ecosystem, Paimon was built streaming-first. Its LSM-tree design optimized for high-throughput upserts and continuous compaction, positioning it as a bridge between real-time CDC ingestion and analytics.
DuckLake (2025) – The newest entrant, introduced by the DuckDB/MotherDuck team. Instead of managing JSON or Avro metadata files, DuckLake stores all table metadata in a relational database. This catalog-centric design aims to simplify consistency, enable multi-table transactions, and drastically speed up query planning.

These formats represent waves of innovation:

First wave (Hudi, Delta): Improving upon the concept of tables on the data lake.
Second wave (Iceberg): focusing on batch reliability, schema evolution, and interoperability.
Third wave (Paimon, DuckLake): rethinking the architecture for real-time data and metadata simplicity.

Next, we’ll dive into Apache Iceberg in detail, its metadata structure, features, and why it has become the default choice for many modern lakehouse deployments.

Apache Iceberg: The Batch-First Powerhouse

Background & Origins
Apache Iceberg was born at Netflix in 2018 and donated to the Apache Software Foundation in 2019. Its mission was clear: fix the long-standing problems of Hive tables, unreliable schema changes, expensive directory scans, and lack of true atomicity. Iceberg introduced a clean-slate design that scaled to petabytes while guaranteeing ACID transactions, schema evolution, and time-travel queries.

Metadata Structure
Iceberg’s metadata model is built on a hierarchy of files:

Table metadata file (JSON): tracks schema versions, partition specs, snapshots, and properties.
Snapshots: each commit creates a new snapshot, representing the table’s full state at that point in time.
Manifest lists & manifests (Avro): hierarchical indexes of data files, enabling partition pruning and column-level stats without scanning entire directories.

This design avoids reliance on directory listings, making planning queries over millions of files feasible.

Core Features

Schema Evolution: Add, drop, or rename columns without breaking queries, thanks to internal column IDs.
Partition Evolution: Change partitioning strategies (e.g., switch from daily to hourly partitions) without rewriting historical data.
Time Travel: Query the table as of a specific snapshot ID or timestamp.
Hidden Partitioning: Abstracts partition logic from users while still enabling efficient pruning.
Optimistic Concurrency: Writers atomically commit new snapshots, with conflict detection to prevent corruption.

Row-Level Changes
Initially copy-on-write, Iceberg now also supports delete files for merge-on-read semantics. Deletes can be tracked separately and applied at read time, reducing write amplification for frequent updates. Background compaction later consolidates these into optimized Parquet files.

Ecosystem & Adoption
Iceberg’s neutrality and technical strengths have driven broad adoption. It is supported in:

Engines: Spark, Flink, Trino, Presto, Hive, Impala, DuckDB.
Cloud platforms: AWS Athena, AWS Glue, Snowflake, BigQuery, Dremio, and more.
Catalogs: Hive Metastore, AWS Glue, Apache Nessie, Polaris.

By late 2024, Iceberg had become the de facto industry standard for open table formats, with adoption by Netflix, Apple, LinkedIn, Adobe, and major cloud vendors. Its community-driven governance and rapid innovation ensure it continues to evolve, recent features like row-level delete vectors and REST catalogs are making it even more capable.

Next, we’ll look at Delta Lake, the transaction-log–driven format that became the backbone of Databricks’ lakehouse vision.

Delta Lake: The Transaction-Log

Background & Origins
Delta Lake was introduced by Databricks around 2017–2018 to address Spark’s biggest gap: reliable transactions on cloud object storage. Open-sourced in 2019 under the Linux Foundation, Delta Lake became the backbone of Databricks’ lakehouse pitch, combining data warehouse reliability with the scalability of data lakes. Its design centered on a simple but powerful idea: use a transaction log to coordinate all changes.

Metadata Structure
At the core of every Delta table is the _delta_log directory:

JSON transaction files: Each commit appends a JSON file describing added/removed data files, schema changes, and table properties.
Checkpoints (Parquet): Periodic checkpoints compact the log for faster reads, storing the authoritative list of active files at a given version.
Versioning: Every commit is versioned sequentially, making time-travel queries straightforward (VERSION AS OF or TIMESTAMP AS OF).

This log-based design is simple and easy to reconstruct: replay JSON logs from the last checkpoint to reach the latest state.

Core Features

ACID Transactions: Ensures consistent reads and writes, even under concurrent Spark jobs.
Schema Enforcement & Evolution: Protects against incompatible writes while allowing schema growth.
Time Travel: Query historical versions for auditing or rollback.
Unified Batch & Streaming: Spark Structured Streaming and batch jobs can read/write the same Delta table, reducing architectural complexity.
Performance Optimizations: Features like Z-order clustering, data skipping, and caching improve query speed (especially in Databricks’ runtime).
Change Data Feed (CDF): Exposes row-level changes between versions, useful for downstream syncs and CDC pipelines.

Row-Level Changes
Delta primarily uses copy-on-write: updates and deletes rewrite entire Parquet files while marking old ones as removed in the log. This guarantees atomicity but can be expensive at scale. To mitigate, Delta introduced deletion vectors (in newer releases), which track row deletions without rewriting whole files, closer to merge-on-read semantics. Upserts are supported via SQL MERGE INTO, commonly used for database change capture workloads.

Ecosystem & Adoption
Delta Lake is strongest in the Spark ecosystem and is the default format in Databricks. It’s also supported by:

Engines: Spark (native), Flink, Trino/Presto (via connectors).
Clouds: AWS EMR, Azure Synapse, and some GCP services.
Libraries: Delta Standalone (Java), Delta Rust, and integrations for Python beyond Spark.

While its openness has improved since Delta 2.0, much of its adoption remains tied to Databricks. Still, Delta Lake is one of the most widely used formats in production, powering pipelines at thousands of organizations.

Next, we’ll explore Apache Hudi, the pioneer of incremental processing and near-real-time data lake ingestion.

Apache Hudi: The Incremental Pioneer

Background & Origins
Apache Hudi (short for Hadoop Upserts Deletes and Incrementals) was created at Uber in 2016 to solve a pressing challenge: keeping Hive tables up to date with fresh, continuously changing data. Uber needed a way to ingest ride updates, user changes, and event streams into their Hadoop data lake without waiting hours for batch jobs. Open-sourced in 2017 and donated to Apache in 2019, Hudi became the first widely adopted table format to support row-level upserts and deletes directly on data lakes.

Metadata Structure
Hudi organizes tables around a commit timeline stored in a .hoodie directory:

Commit files: Metadata describing which data files were added/removed at each commit.
COW vs MOR modes:
- Copy-on-Write (COW): Updates replace entire Parquet files, similar to Iceberg/Delta.
- Merge-on-Read (MOR): Updates land in small Avro delta log files, merged with base Parquet files at read time.
Indexes: Bloom filters or hash indexes help locate records by primary key, making upserts efficient.

This dual-mode design gives engineers control over the trade-off between write latency and read latency.

Core Features

Upserts & Deletes by Key: Guarantees a single latest record per primary key, ideal for CDC ingestion.
Incremental Pulls: Query only the rows changed since a given commit, enabling efficient downstream pipelines.
Compaction: Background jobs merge log files into larger Parquet files for query efficiency.
Savepoints & Rollbacks: Manage table states explicitly, ensuring recovery from bad data loads.
Flexible Indexing: Choose partitioned, global, or custom indexes to balance performance with storage cost.

Row-Level Changes
Hudi was designed for this problem. In COW mode, updates rewrite files. In MOR mode, updates are appended as log blocks, making them queryable almost immediately. Readers can choose:

Snapshot mode (base + logs for freshest data).
Read-optimized mode (compacted base files for speed).

Deletes are handled similarly, either as soft deletes in logs or hard deletes during compaction.

Ecosystem & Adoption
Hudi integrates tightly with:

Engines: Spark (native datasource), Flink (growing support), Hive, Trino/Presto.
Clouds: AWS EMR and AWS Glue have built-in Hudi support, making it popular on S3.
Streaming: Confluent Kafka, Debezium, and Flink CDC can stream directly into Hudi tables.

While Iceberg and Delta now dominate conversations, Hudi remains a strong choice for near real-time ingestion and CDC use cases, particularly in AWS-centric stacks. Its flexibility (COW vs MOR) and incremental consumption features make it especially valuable for pipelines that need fast data freshness without sacrificing reliability.

Next, we’ll examine Apache Paimon, the streaming-first format that extends Hudi’s incremental vision with an LSM-tree architecture.

Apache Paimon: Streaming-First by Design

Background & Origins
Apache Paimon began life as Flink Table Store at Alibaba in 2022, targeting the need for continuous, real-time data ingestion directly into data lakes. It entered the Apache Incubator in 2023 under the name Paimon. Unlike Iceberg or Delta, which started with batch analytics and later added streaming features, Paimon was streaming-first. Its mission: make data lakes act like a materialized view that is always up to date.

Metadata & Architecture
Paimon uses a Log-Structured Merge-tree (LSM) design inspired by database internals:

MemTables and flushes: Incoming data is written to in-memory buffers, then flushed to small immutable files.
Multi-level compaction: Files are continuously merged into larger sorted files in the background.
Snapshots: Each compaction or commit produces a new snapshot, allowing both batch queries and streaming reads.
Primary-key awareness: Tables can enforce keys and apply merge rules (e.g., last-write-wins or aggregate merges).

This architecture makes frequent row-level changes cheap (append-only writes) while deferring heavy merges to compaction tasks.

Core Features

Real-Time Upserts & Deletes: Native support for continuous CDC ingestion with efficient row-level operations.
Merge Engines: Configurable rules for handling key collisions (e.g., overwrite, aggregate, or log-append).
Dual Read Modes: Query as a static snapshot (batch) or as a change stream (streaming).
Streaming/Batch Unification: The same table can power batch analytics and real-time dashboards.
Deletion Vectors: Efficiently tracks row deletions without rewriting base files.

Row-Level Changes
Unlike Iceberg (COW with delete files) or Delta (COW with deletion vectors), Paimon is natively merge-on-read. Updates and deletes are appended as small log segments, queryable immediately. Background compaction gradually merges them into optimized columnar files. This makes Paimon highly efficient for high-velocity workloads like IoT streams, CDC pipelines, or real-time leaderboards.

Ecosystem & Adoption
Paimon integrates tightly with Apache Flink, where it feels like a natural extension of Flink SQL. It also has growing support for Spark, Hive, Trino/Presto, and OLAP systems like StarRocks and Doris. Adoption is strongest among teams building streaming lakehouses, particularly those already invested in Flink. While younger than Iceberg or Delta, Paimon is rapidly attracting attention as organizations push for sub-minute data freshness.

Next, we’ll turn to DuckLake, the newest entrant that rethinks table metadata management by moving it entirely into SQL databases.

DuckLake: Metadata Reimagined with SQL

Background & Origins
DuckLake is the newest table format, introduced in 2025 by the DuckDB and MotherDuck teams. Unlike earlier formats that manage metadata with JSON logs or Avro manifests, DuckLake flips the script: it stores all table metadata in a relational SQL database. This approach is inspired by how cloud warehouses like Snowflake and BigQuery already manage metadata internally, but DuckLake makes it open and interoperable.

Metadata & Architecture

SQL Catalog: Metadata such as snapshots, schemas, file lists, and statistics are persisted as ordinary relational tables.
Transactions: Updates to metadata happen through standard SQL transactions, ensuring strong ACID guarantees without relying on object-store semantics.
Multi-Table Transactions: Because it’s database-backed, DuckLake supports atomic operations across multiple tables, something file-based formats struggle with.
File Storage: Data remains in Parquet files on cloud or local storage, DuckLake just replaces the metadata layer with SQL.

This design dramatically reduces the complexity of planning queries (no manifest scanning), makes commits faster, and enables features like cross-table consistency (possible in Apache Iceberg if using the Nessie catalog).

Core Features

SQL-Native Metadata: Easy to query, debug, or extend using plain SQL.
Fast Commits & Planning: Small updates don’t require writing multiple manifest files, just SQL inserts/updates.
Cross-Table Atomicity: Multi-table changes commit together, a unique strength.
Familiar Deployment: The catalog can run on DuckDB, PostgreSQL, or any transactional SQL database.

Row-Level Changes
DuckLake handles updates and deletes via copy-on-write on Parquet files, but the metadata transaction is nearly instantaneous. Row-level changes are coordinated by the SQL catalog, avoiding the latency and eventual consistency pitfalls of cloud storage–based logs. In effect, DuckLake behaves like Iceberg for data files but with much faster commit cycles.

Ecosystem & Adoption

Primary Engine: DuckDB, via a DuckLake extension.
Potential Integrations: Any SQL-aware engine could adopt DuckLake, since the catalog is just relational tables.
Use Cases: Analytics sandboxes, developer-friendly data apps, and teams seeking simplicity without deploying heavy metadata services.

As of 2025, DuckLake is still young but has sparked excitement by simplifying lakehouse architecture. It’s best seen as a complement to more mature formats, with particular appeal to DuckDB users and teams tired of managing complex metadata stacks.

Next, we’ll step back and compare all five formats side by side, looking at metadata design, row-level update strategies, ecosystem support, and adoption trends.

Comparing the Open Table Formats

Now that we’ve walked through each format individually, let’s compare them across the dimensions that matter most to data engineers and architects.

1. Metadata Architecture

Iceberg: Hierarchical snapshots + manifests. Excellent for pruning large datasets but metadata can be complex.
Delta Lake: Sequential transaction log (_delta_log). Simple and efficient for versioning, but logs can grow large without checkpoints.
Hudi: Commit timeline with optional delta logs. Flexible but more operational overhead.
Paimon: LSM-tree style compaction with snapshots. Streaming-friendly and highly write-efficient.
DuckLake: Metadata in a SQL database. Simplifies commits and query planning, enables multi-table transactions.

2. Row-Level Changes

Iceberg: Copy-on-write by default, with delete files for merge-on-read.
Delta Lake: Copy-on-write, plus deletion vectors in newer versions.
Hudi: Dual modes: COW for read-optimized, MOR for low-latency upserts.
Paimon: Always merge-on-read via LSM-tree segments, optimized for frequent updates.
DuckLake: Copy-on-write, but with faster commit cycles thanks to SQL-backed metadata.

3. Ecosystem Support

Iceberg: Widest engine support (Spark, Flink, Trino, Presto, Hive, Snowflake, Athena, BigQuery, Dremio, DuckDB).
Delta Lake: Deep Spark and Databricks integration; expanding connectors for Flink and Trino.
Hudi: Strong in Spark, Hive, Presto, and AWS (Glue, EMR). Flink support growing.
Paimon: Native to Flink; Spark and Trino integration improving; also ties to OLAP systems like Doris/StarRocks.
DuckLake: Early-stage, centered on DuckDB; potential for other SQL engines to adopt.

4. Adoption Trends

Iceberg: Emerging as the industry standard for open table formats, with broad vendor alignment.
Delta Lake: Dominant within Databricks/Spark ecosystems; adoption tied to Databricks customers.
Hudi: Niche but strong in CDC and near real-time use cases; proven at scale in companies like Uber.
Paimon: Rising fast in the Flink/streaming community; positioned as the “streaming lakehouse” format.
DuckLake: Newest entrant, appealing for simplicity and developer-friendliness; adoption still experimental.

Next, we’ll step back and examine industry trends shaping the adoption of these formats and what they signal for the future of the lakehouse ecosystem.

Industry Trends in Table Format Adoption

The “table format wars” of the past few years are starting to settle into clear patterns of adoption. While no single format dominates every use case, the industry is coalescing around certain choices based on scale, latency, and ecosystem needs.

Iceberg as the Default Standard
Iceberg has emerged as the most widely supported and vendor-neutral choice. Cloud providers like AWS, Google, and Snowflake have all added native support, and query engines like Trino, Presto, Hive, and Flink integrate with it out-of-the-box. Its Apache governance and cross-engine compatibility make it the safe long-term bet for enterprises standardizing on a single open format.

Delta Lake in the Spark/Databricks World
Delta Lake remains the default in Spark- and Databricks-heavy shops. Its simplicity (transaction logs) and seamless batch/stream integration continue to attract teams already invested in Spark. While its ecosystem is narrower than Iceberg’s, Delta Lake’s deep integration with Databricks runtime and machine learning workflows ensures strong adoption in that ecosystem.

Hudi in CDC and Incremental Ingestion
Hudi carved out a niche in change data capture (CDC) and near real-time ingestion. Telecom, fintech, and e-commerce companies still rely on Hudi for incremental pipelines, especially on AWS where Glue and EMR make it easy to deploy. While Iceberg and Delta have added incremental features, Hudi’s head start and MOR tables keep it relevant for low-latency ingestion scenarios.

Paimon and the Rise of Streaming Lakehouses
As real-time analytics demand grows, Paimon is gaining momentum in the Flink community and among companies building streaming-first pipelines. Its LSM-tree design positions it as the go-to choice for high-velocity data, IoT streams, and CDC-heavy architectures. Although young, its momentum signals a broader shift: the next wave of lakehouse innovation is about sub-minute freshness.

DuckLake and Metadata Simplification
DuckLake reflects a newer trend: rethinking metadata management. By moving metadata into SQL databases, it dramatically simplifies operations and enables cross-table transactions. Adoption is still experimental, but DuckLake has sparked interest among teams who want lakehouse features without managing complex catalogs or metastores. Its trajectory will likely influence how future formats handle metadata.

Convergence and Interoperability
One notable trend: features are converging. Iceberg now supports row-level deletes via delete files; Delta added deletion vectors; Hudi and Paimon both emphasize streaming upserts. Tooling is also evolving toward interoperability, catalog services like Apache Nessie and Polaris aim to support multiple formats, and BI engines increasingly connect to all.

In short:

Iceberg is becoming the industry’s lingua franca.
Delta thrives in Databricks-first stacks.
Hudi holds ground in CDC and incremental ingestion.
Paimon is rising with real-time streaming needs.
DuckLake challenges conventions with SQL-backed simplicity.

Next, we’ll wrap up with guidance on how to choose the right format based on your workloads, ecosystem, and data engineering priorities.

Choosing the Right Open Table Format

With five strong options on the table, Iceberg, Delta Lake, Hudi, Paimon, and DuckLake, the choice depends less on “which is best” and more on which aligns with your workloads, ecosystem, and priorities. Here’s how to think about it:

When to Choose Apache Iceberg

You want the broadest engine and vendor support (Spark, Flink, Trino, Presto, Hive, Dremio, Snowflake, BigQuery, etc.).
Your workloads are batch-heavy and prioritize consistent snapshots, schema evolution, and large-scale analytics.
You want to standardize on the emerging industry default with the widest community and neutral Apache governance.

When to Choose Delta Lake

Your data stack is Databricks-first or heavily Spark-centric.
You need seamless batch + streaming unification with Spark Structured Streaming.
You value Databricks’ ecosystem of optimizations (e.g., Z-order, caching, machine learning integrations).

When to Choose Apache Hudi

You need frequent upserts and deletes on data lakes.
Your pipelines depend on incremental consumption of data (only new/changed rows since the last commit).
You want a proven option for CDC ingestion and near real-time pipelines, especially on AWS Glue/EMR.

When to Choose Apache Paimon

Your workloads are streaming-first, with high-velocity CDC or IoT data.
You want to unify real-time and batch processing within the same table.
You’re already invested in Apache Flink and want a table format purpose-built for it.

When to Choose DuckLake

You want simplicity in metadata management (SQL instead of JSON/Avro manifests).
You’re working in DuckDB/MotherDuck environments or need lightweight lakehouse capabilities.
You value fast commits, easy debugging, and multi-table atomicity, even if the format is newer and less battle-tested.

Final Takeaway

Iceberg: the universal standard for long-term interoperability.
Delta: the Databricks/Spark-native option.
Hudi: the incremental/CDC pioneer.
Paimon: the streaming-first disruptor.
DuckLake: the metadata simplifier.

No matter which you choose, adopting an open table format is the key to turning your data lake into a true lakehouse: reliable, flexible, and future-proof.

Conclusion

Open table formats are no longer niche, they’re the foundation of the modern data stack. Whether your challenge is batch analytics, real-time ingestion, or simplifying metadata, there’s a format designed to meet your needs. The smart path forward isn’t just picking one blindly, but aligning your choice with your data velocity, tooling ecosystem, and long-term governance strategy.

In practice, many organizations run more than one format side by side. The good news: as open standards mature, interoperability and ecosystem support are expanding, making it easier to evolve over time without locking yourself into a dead end.

The lakehouse era is here, and open table formats are its backbone.

Get Data Lakehouse Books:

Join the Data Lakehouse Community Data Lakehouse Blog Roll