Skip to content
Patterns & Architecture Last updated: May 14, 2026

Write-Audit-Publish (WAP) Pattern

The Write-Audit-Publish (WAP) pattern is a data pipeline quality assurance workflow using Apache Iceberg branches to write new data to an isolated staging branch, validate it with automated data quality checks, then publish it to the main branch only if validation passes.

write audit publish patternwap icebergiceberg staging branchiceberg data quality pipelineiceberg etl validation

Write-Audit-Publish (WAP) Pattern with Apache Iceberg

The Write-Audit-Publish (WAP) pattern is a data pipeline quality assurance workflow that uses Iceberg’s branching capabilities to enforce a three-stage commit process:

  1. Write: New data is written to an isolated Iceberg branch (the “staging” or “audit” branch) — invisible to production consumers.
  2. Audit: Automated data quality checks run against the staging branch to validate the new data.
  3. Publish: If all checks pass, the staging branch is merged/fast-forwarded to main — making the data visible to production. If checks fail, the branch is discarded without affecting production.

WAP eliminates a fundamental risk in data pipelines: bad data reaching production consumers. Without WAP, a pipeline that writes corrupted data to the main branch immediately breaks dashboards, reports, and AI agents.

The WAP Problem Without Iceberg

Traditional WAP implementations require:

Iceberg’s branching makes WAP zero-copy and atomic:

WAP Implementation with Iceberg Branches

Step 1: Create a Staging Branch

-- Create a staging branch from the current main snapshot
ALTER TABLE db.orders CREATE BRANCH wap_staging;

Or in Python:

table.manage_snapshots().create_branch("wap_staging").commit()

Step 2: Write to the Staging Branch

# Spark: write only to the staging branch
spark.conf.set("spark.wap.branch", "wap_staging")

# All subsequent writes go to wap_staging, not main
spark.sql("""
    INSERT INTO db.orders
    SELECT * FROM staging.raw_orders WHERE batch_date = '2026-05-14'
""")

Step 3: Audit (Data Quality Checks)

# Read from staging branch for validation
spark.conf.set("spark.wap.branch", "wap_staging")

quality_checks = [
    # No nulls in required fields
    spark.sql("SELECT COUNT(*) FROM db.orders WHERE order_id IS NULL").collect()[0][0] == 0,
    # No future order dates
    spark.sql("SELECT COUNT(*) FROM db.orders WHERE order_date > CURRENT_DATE()").collect()[0][0] == 0,
    # Revenue is positive
    spark.sql("SELECT COUNT(*) FROM db.orders WHERE total < 0").collect()[0][0] == 0,
    # Row count increased from last snapshot
    spark.sql("SELECT COUNT(*) FROM db.orders").collect()[0][0] > 1000,
]

all_passed = all(quality_checks)

Step 4: Publish or Discard

if all_passed:
    # Publish: fast-forward main to the staging snapshot
    table.manage_snapshots() \
        .fast_forward("main", "wap_staging") \
        .commit()
    print("Data quality checks passed. Published to main.")
else:
    # Discard: drop the staging branch without affecting main
    table.manage_snapshots() \
        .remove_branch("wap_staging") \
        .commit()
    raise ValueError("Data quality checks failed. Staging branch discarded.")

WAP in a Full Airflow Pipeline

from airflow import DAG
from airflow.operators.python import PythonOperator, BranchPythonOperator

with DAG("orders_wap_pipeline", schedule="@daily") as dag:

    create_branch = PythonOperator(
        task_id="create_wap_branch",
        python_callable=lambda: table.manage_snapshots()
            .create_branch("wap_staging").commit()
    )

    load_data = PythonOperator(
        task_id="load_to_staging_branch",
        python_callable=run_etl_to_staging_branch
    )

    audit_data = BranchPythonOperator(
        task_id="audit_data",
        python_callable=run_quality_checks,  # returns 'publish' or 'discard'
    )

    publish = PythonOperator(
        task_id="publish",
        python_callable=fast_forward_to_main
    )

    discard = PythonOperator(
        task_id="discard",
        python_callable=drop_staging_branch
    )

    create_branch >> load_data >> audit_data >> [publish, discard]

WAP Benefits

BenefitDescription
Zero-copy stagingStaging branch shares files with main — no data duplication
Atomic publishFast-forward is instantaneous metadata operation
Safe rollbackDiscard branch without affecting production consumers
Full Iceberg features on stagingTime travel, schema inspection, row counts — all work on staging branch
Parallel pipeline testingMultiple branches can be validated simultaneously

WAP vs. Catalog-Level Branching (Nessie)

Iceberg WAP operates at the table level — each table has its own staging branch. Project Nessie provides catalog-level WAP — a single branch that spans all tables in the catalog, enabling cross-table atomic staging and publishing. For pipelines that update multiple tables in lockstep, Nessie’s catalog-level branches provide stronger consistency guarantees.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base