Skip to content
Engines & Integrations Last updated: May 14, 2026

dbt and Apache Iceberg

dbt (data build tool) transforms raw Iceberg table data into clean, tested, documented analytical models using SQL, with native Iceberg materialization support via the dbt-spark and dbt-trino adapters, making dbt the standard SQL transformation layer in the Iceberg lakehouse.

dbt icebergdbt apache icebergdbt spark icebergdbt trino icebergdbt lakehouse iceberg models

dbt and Apache Iceberg

dbt (data build tool) is the dominant SQL-first data transformation framework used by data engineers and analytics engineers worldwide. It enables teams to define data models as SQL SELECT statements, manage dependencies between models, run data quality tests, and auto-generate documentation — all from a version-controlled SQL codebase.

Apache Iceberg is the ideal storage format for dbt models in a lakehouse architecture: dbt’s idempotent table materializations map naturally to Iceberg’s transactional table creation and overwrite capabilities, and dbt’s incremental materializations align with Iceberg’s MoR/CoW strategies.

dbt Adapters for Iceberg

dbt connects to query engines via adapters. The most common Iceberg adapters:

dbt-spark (with Iceberg)

The most mature Iceberg + dbt combination. Configure the SparkSession with an Iceberg catalog:

# profiles.yml
my_profile:
  target: dev
  outputs:
    dev:
      type: spark
      method: thrift
      host: spark-thrift-host
      port: 10001
      schema: analytics
      connect_retries: 5

Configure Iceberg catalog in SparkSession before dbt runs:

spark.conf.set("spark.sql.catalog.iceberg", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.iceberg.type", "rest")
spark.conf.set("spark.sql.catalog.iceberg.uri", "https://my-polaris.example.com")

dbt-trino

Queries Iceberg tables via Trino:

# profiles.yml
my_profile:
  target: dev
  outputs:
    dev:
      type: trino
      host: trino.example.com
      port: 443
      database: iceberg_catalog # Trino catalog name
      schema: analytics
      auth:
        method: jwt
        jwt_token: "{{ env_var('TRINO_JWT_TOKEN') }}"

dbt-dremio

Dremio has a dbt adapter that queries Iceberg tables through Dremio’s semantic layer:

# profiles.yml
my_profile:
  target: dev
  outputs:
    dev:
      type: dremio
      host: my-dremio.example.com
      port: 9047
      user: "{{ env_var('DREMIO_USER') }}"
      password: "{{ env_var('DREMIO_PASSWORD') }}"
      database: "$scratch"
      schema: analytics

dbt Model Types and Iceberg

Table Materialization (Full Refresh)

Creates or replaces the Iceberg table each run — equivalent to CREATE OR REPLACE TABLE ... AS SELECT:

-- models/silver/orders_clean.sql
{{ config(
    materialized='table',
    file_format='iceberg',
    partition_by=['months(order_date)', 'region'],
    properties={
        'write.target-file-size-bytes': '268435456',
        'format-version': '2'
    }
) }}

SELECT
    order_id,
    customer_id,
    CAST(total AS DECIMAL(18,2)) AS total_usd,
    order_date,
    region,
    UPPER(status) AS status
FROM {{ source('bronze', 'raw_orders') }}
WHERE order_id IS NOT NULL
  AND total > 0

Incremental Materialization (Append / Merge)

Only processes new or changed rows — the core pattern for large production Iceberg tables:

-- models/silver/events.sql
{{ config(
    materialized='incremental',
    file_format='iceberg',
    incremental_strategy='append',  -- or 'merge', 'insert_overwrite'
    unique_key='event_id',
    partition_by=['days(event_ts)']
) }}

SELECT
    event_id,
    user_id,
    event_type,
    event_ts,
    properties
FROM {{ source('bronze', 'raw_events') }}

{% if is_incremental() %}
    WHERE event_ts > (SELECT MAX(event_ts) FROM {{ this }})
{% endif %}

Incremental strategies for Iceberg:

dbt Tests on Iceberg Tables

dbt’s built-in tests validate Iceberg table data quality after each run:

# models/schema.yml
models:
  - name: orders_clean
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: total_usd
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"
      - name: status
        tests:
          - accepted_values:
              values: ["PENDING", "SHIPPED", "DELIVERED", "CANCELLED"]

Failed tests surface data quality issues before downstream consumers see bad data — complementing Iceberg’s WAP pattern for pipeline safety.

dbt Sources and Iceberg

Register existing Iceberg tables as dbt sources for lineage tracking:

# models/sources.yml
sources:
  - name: bronze
    database: iceberg_catalog
    schema: bronze
    tables:
      - name: raw_orders
        description: "Raw orders from operational DB CDC"
        loaded_at_field: _loaded_at
        freshness:
          warn_after: { count: 4, period: hour }
          error_after: { count: 12, period: hour }
      - name: raw_events
        description: "Raw clickstream events from Kafka"

dbt’s source freshness check (dbt source freshness) validates that Iceberg source tables are being updated within their SLA — alerting on pipeline delays.

dbt Docs and Iceberg Catalog

dbt docs generate creates a rich documentation site from your models, with:

This dbt documentation layer complements the AI Semantic Layer in Dremio — dbt documents the transformation logic; Dremio’s semantic layer exposes the result to AI agents.

dbt + Iceberg in the Medallion Architecture

The classic lakehouse flow with dbt:

Bronze (raw Iceberg) → dbt → Silver (clean Iceberg) → dbt → Gold (metrics Iceberg)
sources: raw_orders, raw_events, raw_customers (Iceberg, CDC-loaded)

staging models: stg_orders, stg_events (dbt: type casting, renaming)

intermediate models: int_orders_enriched (dbt: business logic joins)

mart models: fct_orders, dim_customers, agg_revenue (dbt: gold layer)

Dremio Virtual Datasets → AI Semantic Layer → AI Agent analytics

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base