Knowledge Base

Apache Iceberg Reference

A definitive, authoritative reference for every major Apache Iceberg concept — from the core table format and metadata layer to catalogs, query engines, operational patterns, and agentic data architectures. Each entry is written to be a standalone resource, deeply interlinked across the knowledge base.

222 terms across 14 categories

Core Concepts

ACID Transactions in Apache Iceberg Apache Iceberg delivers full ACID transaction guarantees on object storage through optimistic concurrency control and at …
Apache Iceberg Spec v1 vs v2 Apache Iceberg Spec v2 introduced row-level deletes (delete files), sequence numbers, required field tracking, and impro …
Apache Iceberg Spec v3 Apache Iceberg Spec v3 introduces deletion vectors for more efficient row-level deletes, the Variant data type for semi- …
Apache Iceberg Spec v4 (Current State) Apache Iceberg Spec v4 is in early community discussion and proposal stages as of 2025, with potential features includin …
Apache Iceberg Table Format The Apache Iceberg table format is a specification defining how data files, metadata files, manifests, and snapshots are …
Apache Iceberg vs Apache Hudi Apache Iceberg and Apache Hudi are both open table formats for cloud lakehouses: Iceberg prioritizes multi-engine intero …
Apache Iceberg vs Delta Lake Apache Iceberg and Delta Lake are the two dominant open table formats for cloud lakehouses: Iceberg offers superior mult …
Data Lakehouse A data lakehouse is a modern data architecture that combines the low-cost, scalable storage of a data lake with the reli …
Hidden Partitioning in Apache Iceberg Hidden partitioning in Apache Iceberg separates the physical partition layout from the logical table schema, allowing th …
Iceberg Column Mapping Iceberg column mapping decouples the logical column names in the schema from the physical field names in data files usin …
Iceberg Deletion Vectors Deletion vectors are a Spec v3 enhancement to Apache Iceberg's row-level delete mechanism, replacing positional delete f …
Iceberg Open Table Format vs. Delta Lake vs. Apache Hudi Apache Iceberg, Delta Lake, and Apache Hudi are the three dominant open table formats competing to be the storage founda …
Iceberg Sequence Number The Iceberg sequence number is a monotonically increasing integer assigned to each snapshot and each data/delete file, i …
Iceberg Snapshot References Iceberg snapshot references are named pointers (branches and tags) stored in the table metadata that reference specific …
Iceberg Sort Order An Iceberg sort order is a table-level specification stored in metadata that defines how data should be physically order …
Iceberg Table Properties Iceberg table properties are key-value configuration settings stored in the table metadata that control write behavior, …
Iceberg Table Statistics (Puffin) Iceberg table statistics are advanced column-level metrics: including NDV (number of distinct values) estimates using Ap …
Iceberg Views Apache Iceberg Views are named, stored SQL queries managed by the Iceberg catalog that appear as virtual tables to downs …
Open Table Format Comparison (Iceberg, Delta Lake, Hudi, Paimon) A comprehensive comparison of the four major open table formats: Apache Iceberg, Delta Lake, Apache Hudi, and Apache Pai …
Partition Evolution in Apache Iceberg Partition evolution in Apache Iceberg lets you change a table's partitioning scheme at any time without rewriting existi …
Schema Evolution in Apache Iceberg Schema evolution in Apache Iceberg allows you to safely add, drop, rename, reorder, and widen columns in a table without …
Time Travel in Apache Iceberg Time travel in Apache Iceberg lets you query a table as it existed at any past snapshot or timestamp, enabling reproduci …
What is Apache Iceberg? Apache Iceberg is an open, high-performance table format for huge analytic datasets stored in data lakes, enabling ACID …

File & Metadata Layer

Catalogs

Operations & Optimization

Engines & Integrations

Agentic & AI

Cloud-Specific Integrations

Modern Lakehouse Concepts & Interoperability

Lakehouse Catalogs & Governance

Dremio-Specific Engine & Optimizations

Governance & Security

Table Format Maintenance & Operations

Iceberg Specification, Schema & Internals

Patterns & Architecture

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

📗

Architecting Apache Iceberg Lakehouses Manning Publications · Alex Merced et al. View on Amazon →

📘

Lakehouses with Apache Iceberg & Agentic Workflows Alex Merced View on Amazon →

📙

Apache Iceberg & Agentic: Connecting Structured Data Alex Merced View on Amazon →