Skip to content
File & Metadata Layer Last updated: May 14, 2026

Apache Iceberg ORC Format

Apache ORC (Optimized Row Columnar) is an alternative columnar storage format supported by Apache Iceberg alongside Parquet, commonly used in Iceberg tables migrated from Hive ORC workloads and in environments where ORC's native ACID capabilities were previously relied upon.

iceberg orc formatapache orc icebergiceberg orc parquet comparisonorc iceberg tablesorc columnar format iceberg

Apache ORC in Apache Iceberg

Apache ORC (Optimized Row Columnar) is a columnar storage format developed by Hortonworks (now part of Cloudera) and now maintained as an Apache top-level project. Apache Iceberg supports ORC as one of its three natively supported data file formats (alongside Parquet and Avro), giving teams flexibility in their choice of storage format.

While Parquet is the dominant format for Iceberg tables in most modern deployments, ORC remains relevant for organizations migrating from Hive ORC workloads and certain specific use cases.

ORC vs. Parquet in Iceberg

Both ORC and Parquet are columnar formats with similar goals. The differences matter primarily in legacy contexts:

FeatureORCParquet
OriginHortonworks (Hive ecosystem)Twitter + Cloudera (broad ecosystem)
CompressionZlib, Snappy, LZ4, ZstdGzip, Snappy, LZ4, Zstd
Column statisticsMin/max, null count, sum, distinct countMin/max, null count
Native ACIDYes (Hive ACID relied on ORC)No (Iceberg handles ACID)
Index structuresRow index, bloom filtersRow group statistics, bloom filters
Nested typesGoodExcellent (Parquet Dremel encoding)
Ecosystem supportStrong in Hive/HadoopBroader (Spark, Dremio, Trino, Arrow)
Iceberg adoptionMinorityDominant

When ORC Appears in Iceberg

1. Hive ORC Migration (Most Common)

Organizations migrating from Hive tables where ORC was the default (Hive 0.x–3.x defaulted to ORC for ACID tables) will produce Iceberg tables with ORC data files after an in-place migration:

-- After migrate, the table is Iceberg but data files are still ORC
CALL spark_catalog.system.migrate('db.hive_orc_table');
-- Table is now Iceberg with ORC data files (valid — Iceberg reads ORC natively)

Post-migration, you can leave the ORC files in place (Iceberg reads them) or convert to Parquet via compaction:

-- Convert ORC files to Parquet during compaction
ALTER TABLE db.orders SET TBLPROPERTIES ('write.format.default' = 'parquet');
CALL system.rewrite_data_files(
    table => 'db.orders',
    options => map('write-format', 'parquet')
);
-- Old ORC files are replaced with Parquet during compaction

2. Specific ORC Advantages

ORC’s richer built-in statistics (sum, distinct count at the stripe level) can benefit some query planners that are optimized for ORC. In practice, Iceberg’s manifest-level statistics supersede most of these advantages.

3. Hive Compatibility

If the same Iceberg table must remain readable by Apache Hive (which has excellent ORC native support), keeping ORC format ensures maximum Hive compatibility.

Writing ORC Iceberg Tables

-- Create a new Iceberg table with ORC format
CREATE TABLE db.events (
    event_id BIGINT,
    user_id  BIGINT,
    event_ts TIMESTAMP,
    payload  STRING
) USING iceberg
TBLPROPERTIES ('write.format.default' = 'orc');

-- Per-write ORC override
INSERT INTO db.orders /*+ FORMAT('orc') */
SELECT * FROM staging.new_orders;

ORC Compression in Iceberg

ORC supports multiple compression codecs:

ALTER TABLE db.events SET TBLPROPERTIES (
    'write.format.default' = 'orc',
    'write.orc.compression-codec' = 'zstd',  -- or: zlib, snappy, lz4, none
    'write.orc.compression-strategy' = 'speed'  -- or: compression
);

Recommendation: ORC vs. Parquet for New Tables

For new Iceberg table creation:

For mixed-format tables (some ORC files, some Parquet after migration), Iceberg handles the format mix transparently — each data file entry in the manifest records its own format, and the reader handles format detection per file.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base