Skip to content
Iceberg Specification, Schema & Internals Last updated: May 29, 2026

Iceberg Column Projection

The process of reading only the specific fields requested by a query from physical storage files using unique field IDs instead of column names.

column projectionfield id mappingiceberg query optimization

Iceberg Column Projection

Iceberg Column Projection is a query planning optimization that extracts only the columns required to satisfy a SQL query, avoiding the overhead of reading unused fields from physical disk storage. Because analytical formats like Parquet and ORC organize data columns sequentially, query engines use projection to skip entire segments of files, reducing storage I/O and network transfer costs.

Iceberg improves on traditional projection techniques by mapping columns to unique, immutable integer field IDs rather than string column names.

Field ID Resolution

In Hive-style tables, columns are projected by name. If a column is renamed or reordered, older data files become unreadable or require expensive rewrite operations to align names. Iceberg solves this problem:

/* The engine projects only field IDs associated with customer_id and amount */
SELECT customer_id, amount FROM sales.orders;

Nested Column Projection

Column projection is also applied to nested data types (structs, lists, and maps). If a table has a struct column representing user profiles, and a query requests only profile.zipcode, Iceberg projects only the nested zipcode field:

{
  "id": 4,
  "name": "profile",
  "type": {
    "type": "struct",
    "fields": [
      { "id": 5, "name": "street", "type": "string", "required": false },
      { "id": 6, "name": "zipcode", "type": "string", "required": false }
    ]
  }
}

The projection planner tells the file reader to read field ID 6 and ignore field ID 5. This nested pruning minimizes disk scans and improves retrieval performance for datasets with complex schemas.

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base