Iceberg Equality Deletes

Equality delete files in Apache Iceberg record column values identifying rows to be deleted, enabling row-level deletes without knowing physical row positions, ideal for business-logic deletes (GDPR erasure, CDC by primary key) in Merge-on-Read mode.

Iceberg Equality Delete Files

Equality delete files are the second type of delete file in Apache Iceberg Spec v2. Unlike positional deletes (which require knowing the exact physical position of each deleted row), equality delete files record column values that identify deleted rows, allowing any engine to identify and exclude those rows without knowing their physical location in the data files.

Structure of an Equality Delete File

An equality delete file is a columnar file (Parquet or Avro) containing the columns used to identify deleted rows. For example, if you delete rows by customer_id:

customer_id
12345
99876
54321

These entries mean: “When reading this table, exclude any row where customer_id is 12345, 99876, or 54321.”

The delete file can include multiple columns to form a compound key:

order_id	region
1001	us-east
2045	eu-west

These entries delete rows only where both conditions match.

How Equality Deletes Are Applied During Reads

The query engine applies equality delete files in a join-like operation:

The engine identifies which equality delete files apply to the data files being scanned.
For each batch of data file rows, the engine checks whether any row matches a delete entry.
Matching rows are excluded from the result.

This is equivalent to a NOT IN or ANTI JOIN filter applied to the scan. It is more expensive than positional deletes (which are simple position lookups) but more flexible (no position knowledge required).

When Equality Deletes Are Generated

Equality deletes are generated by:

Apache Spark DELETE and UPDATE statements (in Merge-on-Read mode): When DELETE FROM orders WHERE customer_id = 12345 runs in MoR mode, Spark writes an equality delete file with {customer_id: 12345}.
Apache Flink upsert with equality field specification: When Flink’s CDC upsert sink applies an UPDATE or DELETE event, it can generate equality delete files if positional information is unavailable.
Batch GDPR erasure scripts: A common compliance pattern is a daily batch job that generates equality delete files for all GDPR erasure requests received that day.

Equality deletes are the ideal mechanism for GDPR right-to-erasure:

-- Spark: delete all rows for a specific user across a table
DELETE FROM db.user_events
WHERE user_id = 12345;
-- → writes equality delete file: {user_id: 12345}

DELETE FROM db.orders
WHERE customer_id = 12345;
-- → writes equality delete file: {customer_id: 12345}

The deletion is:

Immediate (logical): The equality delete files are written in seconds. The user’s data is immediately excluded from all query results.
Physical erasure (deferred): Run compaction after deletion to physically remove the user’s data from Parquet files: required for full GDPR compliance.

-- Compact after GDPR deletions to physically erase data
CALL system.rewrite_data_files('db.user_events');

Equality Delete Files vs. Positional Delete Files

Dimension	Equality Deletes	Positional Deletes
What is recorded	Column value(s)	File path + row position
Position knowledge required	No	Yes
Read cost	High (join-like scan)	Low (position lookup)
Write cost	Very low	Low (requires position tracking)
Best for	Business-logic deletes, GDPR	Streaming CDC in Flink
Generated by	Spark DML, batch scripts	Flink CDC, Spark streaming

Equality Delete Performance Considerations

Equality delete files are the more expensive delete type for reads because they require checking every scanned row against the delete entries. Performance degrades when:

Many equality delete files accumulate: Each file adds a join-like pass over the data.
Delete files cover many rows: Large equality delete files increase lookup overhead.
Scans cross many partitions: Delete files must be applied to every partition that intersects with the deleted rows.

Mitigation: Run compaction regularly to apply accumulated equality deletes and produce clean data files.

Equality Delete File Scope

Like positional delete files, equality delete files are scoped in their manifest entries. This scoping typically covers the partitions that contain the deleted rows, so equality delete files for customer_id = 12345 are only loaded when scanning partitions that could contain that customer’s data.