6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article discusses various data quality design patterns used in data engineering, focusing on WAP, AWAP, and TAP. It outlines how these patterns help ensure data integrity through structured processes like validation and auditing before data is published to production.
If you do, here's more
The article outlines various data quality design patterns that data engineers can leverage, focusing on four main patterns: WAP (Write–Audit–Publish), AWAP (Audit–Write–Audit–Publish), TAP (Transform–Audit–Publish), and the Signal Table Pattern. Each pattern serves to ensure data quality before it reaches production, minimizing the risk of bad data affecting downstream systems. The discussion emphasizes that no single approach is definitively better; the right choice often hinges on specific use cases, such as data size, platform limitations, and service-level agreements.
WAP, the primary pattern discussed, draws from traditional software engineering principles. It involves writing data to a temporary area, performing quality checks, and only then promoting the data to production if it passes validation. This concept is not entirely new, as earlier data warehouse designs also included staging areas, but WAP formalizes this process as a protective measure against bad data. The article elaborates on two implementation methods for WAP: the Two-Phase WAP, which uses two physical table copies, and the One-Phase WAP, which leverages modern data lakehouse technologies for zero-copy operations.
Implementation strategies vary based on technology. For example, the DIY approach using Pandas offers flexibility but requires significant engineering effort and can lead to increased costs due to duplicated data. Conversely, using Snowflake’s zero-copy clones allows for quicker validation without duplicating storage, while Apache Iceberg introduces a branching mechanism that mimics Git for data, enabling atomic and consistent changes. The article emphasizes the importance of these patterns in safeguarding production data and enhancing trust in data-driven systems.
Questions about this article
No questions yet.