Skip to content
File & Metadata Layer Last updated: May 14, 2026

Iceberg Equality Deletes

Equality delete files in Apache Iceberg record column values identifying rows to be deleted, enabling row-level deletes without knowing physical row positions, ideal for business-logic deletes (GDPR erasure, CDC by primary key) in Merge-on-Read mode.

iceberg equality deletesiceberg equality delete fileiceberg gdpr deleteiceberg delete by valueiceberg merge on read deletes

Iceberg Equality Delete Files

Equality delete files are the second type of delete file in Apache Iceberg Spec v2. Unlike positional deletes (which require knowing the exact physical position of each deleted row), equality delete files record column values that identify deleted rows — allowing any engine to identify and exclude those rows without knowing their physical location in the data files.

Structure of an Equality Delete File

An equality delete file is a columnar file (Parquet or Avro) containing the columns used to identify deleted rows. For example, if you delete rows by customer_id:

customer_id
12345
99876
54321

These entries mean: “When reading this table, exclude any row where customer_id is 12345, 99876, or 54321.”

The delete file can include multiple columns to form a compound key:

order_idregion
1001us-east
2045eu-west

These entries delete rows only where both conditions match.

How Equality Deletes Are Applied During Reads

The query engine applies equality delete files in a join-like operation:

  1. The engine identifies which equality delete files apply to the data files being scanned.
  2. For each batch of data file rows, the engine checks whether any row matches a delete entry.
  3. Matching rows are excluded from the result.

This is equivalent to a NOT IN or ANTI JOIN filter applied to the scan. It is more expensive than positional deletes (which are simple position lookups) but more flexible (no position knowledge required).

When Equality Deletes Are Generated

Equality deletes are generated by:

Equality Deletes for GDPR Compliance

Equality deletes are the ideal mechanism for GDPR right-to-erasure:

-- Spark: delete all rows for a specific user across a table
DELETE FROM db.user_events
WHERE user_id = 12345;
-- → writes equality delete file: {user_id: 12345}

DELETE FROM db.orders
WHERE customer_id = 12345;
-- → writes equality delete file: {customer_id: 12345}

The deletion is:

  1. Immediate (logical): The equality delete files are written in seconds. The user’s data is immediately excluded from all query results.
  2. Physical erasure (deferred): Run compaction after deletion to physically remove the user’s data from Parquet files — required for full GDPR compliance.
-- Compact after GDPR deletions to physically erase data
CALL system.rewrite_data_files('db.user_events');

Equality Delete Files vs. Positional Delete Files

DimensionEquality DeletesPositional Deletes
What is recordedColumn value(s)File path + row position
Position knowledge requiredNoYes
Read costHigh (join-like scan)Low (position lookup)
Write costVery lowLow (requires position tracking)
Best forBusiness-logic deletes, GDPRStreaming CDC in Flink
Generated bySpark DML, batch scriptsFlink CDC, Spark streaming

Equality Delete Performance Considerations

Equality delete files are the more expensive delete type for reads because they require checking every scanned row against the delete entries. Performance degrades when:

Mitigation: Run compaction regularly to apply accumulated equality deletes and produce clean data files.

Equality Delete File Scope

Like positional delete files, equality delete files are scoped in their manifest entries. This scoping typically covers the partitions that contain the deleted rows, so equality delete files for customer_id = 12345 are only loaded when scanning partitions that could contain that customer’s data.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base