SparkDQ is a data quality framework specifically designed for PySpark, allowing users to define and run data quality checks directly within their Spark pipelines. By supporting declarative configurations and programmatic checks, it helps teams catch data issues early without adding complexity to their workflows. The framework facilitates robust validation across various stages of data processing, ensuring trust and quality in data operations.
sparkdq ✓
data-quality ✓
pyspark ✓
validation ✓
+ frameworks