Tag: Data Engineering
All the articles with the tag "Data Engineering".
- 27 MIN READ•May 22, 2026
Apache Iceberg SCD Type 2 and CDC Patterns: Building Historical Lakehouse Tables
A deep dive into implementing Slowly Changing Dimension Type 2 (SCD Type 2) patterns and Change Data Capture (CDC) pipelines on Apache Iceberg, using PySpark and Dremio.
apache icebergcdcscd type 2 - 24 MIN READ•May 22, 2026
Apache Iceberg Catalogs Explained: REST, Glue, Hive Metastore, Polaris, Nessie, and Snowflake
A deep dive into Apache Iceberg catalog architecture, comparing REST catalogs, AWS Glue, Project Nessie, Polaris, and Snowflake. Learn catalog role, credential vending, and cross-engine configurations.
apache icebergcatalogsNessie - 24 MIN READ•May 22, 2026
Maintaining Apache Iceberg Tables: Compaction, Snapshot Expiration, and Orphan File Cleanup
An in-depth guide to orchestrating maintenance operations on Apache Iceberg tables, covering bin-packing, sort-based, Z-Order compaction, snapshot expiration, and orphan file removal, with query acceleration details for the Dremio engine.
Apache IcebergCompactionData Engineering