Skip to content
Catalogs Last updated: May 14, 2026

What is an Iceberg Catalog?

An Apache Iceberg catalog is the service responsible for tracking the current metadata file location for each Iceberg table, enabling engines to discover and access tables while supporting atomic table creation, updates, and deletion.

iceberg catalogapache iceberg catalogwhat is an iceberg catalogiceberg table catalogiceberg catalog types

What is an Apache Iceberg Catalog?

An Iceberg catalog is the service that stores and manages the mapping between Iceberg table names and their current metadata file locations. It is the entry point that allows any query engine to discover and access Iceberg tables stored in object storage.

The catalog’s role is deliberately minimal and well-defined: for each table, it stores exactly one piece of information — the path to the table’s current metadata file in object storage. Everything else about the table’s history, schema, partitioning, and data is encoded in the metadata file chain itself.

Why a Catalog is Necessary

Apache Iceberg tables store all their structural metadata in object storage (metadata files, manifest lists, manifest files). But object storage has no concept of a “current” or “authoritative” version of a table. Without a catalog:

The catalog provides the atomic commit primitive: only one metadata pointer update can succeed when multiple writers compete, which is how Iceberg achieves ACID semantics at the table level.

What a Catalog Manages

At minimum, an Iceberg catalog must support:

Types of Iceberg Catalogs

Hive Metastore Catalog

The original Iceberg catalog implementation, using the Hive Metastore service to store metadata file pointers in a relational database (MySQL, PostgreSQL). Widely supported, but introduces a dependency on the JVM-based HMS service.

Iceberg REST Catalog

A language-agnostic HTTP REST API specification for implementing Iceberg catalogs. Any service that implements the Iceberg REST Catalog spec can serve as an Iceberg catalog, decoupling clients from specific catalog implementations. This is the modern standard.

Apache Polaris (formerly Snowflake’s open-source Polaris)

An Apache-governed, open-source implementation of the Iceberg REST Catalog specification. Co-created by Dremio and Snowflake and donated to the Apache Foundation. Dremio’s Open Catalog capability is powered by Apache Polaris.

Project Nessie

A transactional metadata catalog with Git-like branch-and-merge semantics, implementing the Iceberg REST Catalog interface. Enables branch-based ETL development and zero-copy experiments.

AWS Glue Data Catalog

Amazon’s managed metadata catalog service, with native Iceberg REST Catalog support (as of 2023). Tightly integrated with AWS analytics services (Athena, EMR, Glue ETL).

JDBC Catalog

A generic catalog backed by any JDBC-compatible relational database. Useful for development and small deployments.

In-Memory / Hadoop Catalog

Catalogs that store pointer information in a local filesystem or HDFS path. Primarily used for testing and single-engine deployments.

The Iceberg REST Catalog Specification

The Iceberg REST Catalog is the most important catalog development in the Iceberg ecosystem since the table format itself. By defining a standard HTTP API for catalog operations, it enables:

See Iceberg REST Catalog for the full deep dive.

Catalog and Governance

Modern catalogs like Apache Polaris extend the basic catalog contract with governance capabilities:

This governance layer is what makes the catalog the appropriate integration point for enterprise lakehouse security.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base