Skip to content
Dremio-Specific Engine & Optimizations Last updated: May 29, 2026

Dremio Iceberg Metadata Sync

Dremio Iceberg Metadata Sync is the background coordination process that updates Dremio's catalog pointer to reference the latest snapshot metadata file of external Iceberg tables.

iceberg metadata syncdremio sync metadatatable pointer syncdiscover iceberg updatesexternal engine write sync

Dremio Iceberg Metadata Sync

Dremio Iceberg Metadata Sync is the catalog synchronization process that updates Dremio’s internal metadata pointer to reference the latest snapshot file of an Apache Iceberg table. In open data lakehouse architectures, multiple query engines (such as Apache Spark, Apache Flink, and Trino) write to the same Iceberg tables.

When an external engine commits a transaction, it creates a new metadata JSON file (for example, v3.metadata.json) containing the new table state. Dremio must learn about this new file to plan queries against the updated data.

Sync Mechanisms

Dremio synchronizes its catalog pointers using two primary methods depending on how the data source is configured:

1. Catalog-Managed Sync

When Dremio is connected to a shared REST catalog (such as Apache Polaris, AWS Glue, or Hive Metastore):

2. File-Based Storage Sync (Object Store Directory Sync)

When Dremio accesses Iceberg tables directly from file paths (such as directories on S3 or ADLS without a catalog manager):

ALTER TABLE analytics.orders REFRESH METADATA;

Why Sync Matters

A robust metadata synchronization process is vital for multi-engine environments:

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base