Skip to content
File & Metadata Layer Last updated: May 14, 2026

Iceberg Manifest File

An Iceberg manifest file is an Avro metadata file that tracks a subset of an Iceberg table's data files, recording each file's location, partition values, record counts, and column-level statistics used for data skipping.

iceberg manifest fileiceberg file trackingiceberg column statisticsiceberg data skippingiceberg manifest avro

Iceberg Manifest File

An Iceberg manifest file is the third level in Iceberg’s metadata hierarchy, sitting between the manifest list (snapshot level) and the actual data files. Each manifest file is an Avro-format metadata file that tracks a subset of the table’s data files (or delete files), recording detailed statistics about each tracked file.

Manifests are the workhorse of Iceberg’s query planning engine: the column-level statistics they contain are what enable data skipping — eliminating data files from a query before they are opened based on their min/max values.

Position in the Metadata Hierarchy

Snapshot → Manifest List → Manifest File ← you are here
                                 └── Data File entries (with stats)

Contents of a Manifest File Entry

Each row in a manifest file describes one data file (or delete file) and contains:

FieldDescription
statusADDED, EXISTING, or DELETED
snapshot_idWhich snapshot added this file
sequence_numberWrite order for conflict resolution
file_pathFull URI of the data file in object storage
file_formatPARQUET, ORC, or AVRO
partitionThe partition values for this file
record_countNumber of rows in the file
file_size_in_bytesPhysical file size
column_sizesMap of column ID → byte size in the file
value_countsMap of column ID → non-null value count
null_value_countsMap of column ID → null value count
nan_value_countsMap of column ID → NaN count (for floats)
lower_boundsMap of column ID → serialized minimum value
upper_boundsMap of column ID → serialized maximum value

The Power of Column-Level Statistics

The lower_bounds and upper_bounds fields are what enable data skipping — one of the most impactful performance features of Iceberg.

Example: Column Statistics in Action

A table orders has 10,000 data files. A query runs:

SELECT * FROM orders WHERE total_amount > 500.00;

Without column statistics, the engine must open all 10,000 files to find rows where total_amount > 500.

With Iceberg manifest statistics, the engine reads the manifests and checks lower_bounds and upper_bounds for total_amount in each file entry:

In a typical dataset with reasonable data clustering, this eliminates the majority of files before they are opened.

Manifest File Reuse

A key performance optimization in Iceberg: manifests are reused across snapshots. When a new snapshot is created by appending new data, Iceberg only creates a new manifest for the newly added files. The existing manifests from the previous snapshot are referenced unchanged in the new manifest list.

This means snapshot creation is O(new_files), not O(total_files) — critically important for tables that accumulate millions of files.

Data Manifests vs. Delete Manifests

In Iceberg Spec v2 (which introduced row-level deletes), there are two types of manifests:

The manifest list differentiates these via the content field: 0 = DATA, 1 = DELETES.

Compaction and Manifest Merging

Over time, tables with frequent small writes accumulate many small manifests (each small write creates a new manifest). This creates overhead because query planning must open many manifest files. Compaction (specifically, manifest rewriting) merges small manifests into larger ones, reducing metadata overhead.

-- Spark: rewrite manifests to reduce manifest count
CALL system.rewrite_manifests('db.orders');

Inspecting Manifest File Contents

-- Spark: view all data files tracked across manifests
SELECT * FROM db.orders.files;

-- Key columns returned:
-- file_path, file_format, record_count, file_size_in_bytes,
-- column_sizes, value_counts, null_value_counts,
-- lower_bounds, upper_bounds

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base