10 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Richard Glew discusses the importance of improving data quality testing by applying established software testing principles. He highlights the differences between software and data engineering, emphasizing the need for a structured quality strategy and the involvement of non-technical users in the process. The article sets the stage for practical strategies in future installments.
If you do, here's more
Data quality remains a significant issue, with only 3% of companies meeting basic standards, according to the Harvard Business Review. Richard Glew emphasizes the gap between advancements in software testing and the lag in data quality practices. He argues that companies must adopt structured approaches to data testing, similar to those used in software, to address this problem effectively. The article outlines the importance of defining quality and implementing a Quality Strategy (QS) tailored to specific data engineering workflows, like ETL or ELT processes.
Glew introduces the concept of a Test Strategy (TS) that builds on the QS by outlining specific requirements and tools for testing data workflows. He highlights the "Test Pyramid" approach, which categorizes tests into different levels, from quick unit tests to more complex production tests. This structure allows data engineers to manage test complexity while ensuring that end consumers only see relevant higher-order tests. Implementing automated tests within an agile, DevOps framework is essential, as it enhances efficiency and test coverage.
The article also addresses the unique challenges of data testing. Unlike software applications, direct input validation in data workflows is often absent, leading to issues that only surface in production. This unpredictability makes it harder to manage data quality effectively. Glew stresses that the costs associated with repairing data issues are significantly higher than investing in early testing. He also notes the inertia within organizations, where existing policies around data governance and security complicate the adoption of modern testing practices. The scrutiny of data handling, especially in regulated industries, adds another layer of complexity.
Questions about this article
No questions yet.