Skip to content
Iceberg Specification, Schema & Internals Last updated: May 29, 2026

Iceberg Manifest Entry Schema

The Avro schema definition that specifies how Iceberg tracks data or delete files within manifest files, detailing columns for status, snapshots, and file statistics.

iceberg manifest entry schemamanifest entry avroiceberg file stats

Iceberg Manifest Entry Schema

The Iceberg Manifest Entry Schema defines the format of record entries inside an Iceberg manifest file. While manifest files are logically represented as tables of file paths and statistics, they are physically stored as immutable Avro files. Each row in a manifest file conforms to this schema, tracking the lifecycle and metrics of a single data or delete file.

Schema Structure and Fields

The schema contains metadata tracking columns alongside a nested struct describing the physical file. The key top-level columns in the schema include:

The Nested File Struct

The nested data_file or delete_file struct stores properties that engines use during query planning and file pruning:

Field NameTypeDescription
file_pathstringThe absolute URI location of the file in cloud or object storage.
file_formatstringThe file format, such as PARQUET, AVRO, or ORC.
partitionstructA tuple of partition values corresponding to the table’s partition spec.
record_countlongThe total number of rows stored within this file.
file_size_in_byteslongThe physical file size, used to plan reading splits.
column_sizesmapMap of column ID to size in bytes, helpful for calculating projection cost.
value_countsmapMap of column ID to total value count (including nulls).
null_value_countsmapMap of column ID to null value count, used for null-predicate pushdowns.
nan_value_countsmapMap of column ID to floating-point NaN value count.
lower_boundsmapMap of column ID to serialized minimum value, used for data skipping.
upper_boundsmapMap of column ID to serialized maximum value, used for data skipping.
sort_order_idintThe ID of the sort order applied when this file was written.

By storing these detailed statistics at the file level inside the manifest entry, query engines can determine if a file contains relevant records before opening and reading it.

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base