Skip to content
Catalogs Last updated: May 14, 2026

AWS Glue Catalog for Apache Iceberg

AWS Glue Data Catalog is Amazon's managed metadata catalog service with native support for Apache Iceberg tables via the REST Catalog API, enabling Iceberg workloads across AWS analytics services including Athena, EMR, Glue ETL, and Redshift Spectrum.

aws glue iceberg catalogglue data catalog icebergamazon iceberg catalogaws iceberg integrationglue rest catalog

AWS Glue Catalog for Apache Iceberg

AWS Glue Data Catalog is Amazon Web Services’ managed metadata catalog service, deeply integrated with the AWS analytics ecosystem. Since 2023, Glue has supported Apache Iceberg natively — including Iceberg REST Catalog API compatibility — making it the natural catalog choice for teams running Iceberg workloads entirely within AWS.

Overview

The AWS Glue Data Catalog serves as a centralized metadata repository for data assets in AWS:

Glue and Apache Iceberg

AWS Glue supports Iceberg through two modes:

Native Iceberg Table Support

Glue can register and manage Iceberg tables directly. When using Glue as an Iceberg catalog:

Iceberg REST Catalog Endpoint

AWS introduced an Iceberg REST Catalog-compatible endpoint for Glue, allowing engines and clients that support the REST Catalog API (PyIceberg, Spark with Iceberg REST config, Flink) to use Glue as their Iceberg catalog via the standard protocol.

Using Glue with Apache Spark (EMR/Glue ETL)

spark = SparkSession.builder \
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.warehouse", "s3://my-bucket/warehouse/") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl",
            "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.glue_catalog.io-impl",
            "org.apache.iceberg.aws.s3.S3FileIO") \
    .getOrCreate()

# Create an Iceberg table in Glue
spark.sql("""
    CREATE TABLE glue_catalog.db.orders (
        order_id BIGINT,
        order_date TIMESTAMP,
        total DOUBLE
    ) USING iceberg PARTITIONED BY (days(order_date))
""")

Using Glue with AWS Athena

Athena has native Iceberg support using the Glue catalog:

-- Athena: create Iceberg table in Glue
CREATE TABLE orders (
    order_id bigint,
    order_date timestamp,
    total double
)
PARTITIONED BY (day(order_date))
LOCATION 's3://my-bucket/warehouse/db/orders/'
TBLPROPERTIES ('table_type'='ICEBERG');

-- Time travel query
SELECT * FROM orders FOR TIMESTAMP AS OF '2026-01-01 00:00:00';

AWS Lake Formation Integration

AWS Lake Formation provides fine-grained access control for Glue-cataloged Iceberg tables:

Lake Formation integrates with Glue to enforce these permissions across all AWS services that query the catalog (Athena, EMR, Redshift Spectrum).

Glue vs. Apache Polaris for AWS Workloads

ConsiderationAWS GlueApache Polaris (via Dremio)
AWS service integrationNativeVia REST Catalog API
Multi-cloudAWS onlyCloud-agnostic
Open sourceNoYes
Credential vendingVia IAM rolesNative REST spec
BranchingNoNo (use Nessie)
GovernanceLake FormationBuilt-in RBAC

For teams building on AWS exclusively, Glue is the most frictionless choice. For multi-cloud or cross-engine portability requirements, Apache Polaris (available via Dremio Cloud or self-hosted) offers broader interoperability.

Pricing

AWS Glue Data Catalog has a free tier (up to 1 million objects) with pay-per-use pricing beyond that. Iceberg metadata operations (table loads, commits) count against Glue API request quotas. For very high-throughput Iceberg workloads, catalog request costs should be factored into architecture decisions.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base