Click any tag below to further narrow down your results
Links
This article explores the integration of Flink, Airflow, and StarRocks for real-time data processing. It details different methods for data ingestion, including routine loads and Kafka connectors, while sharing lessons learned from implementation. The author concludes with a preference for the Flink Connector due to its flexibility and existing infrastructure.
This article outlines how a team at Astronomer transformed their data pipeline creation process by adopting a standardized, modular approach. They implemented a declarative framework using Airflow Task Groups, allowing them to automate repetitive tasks, improve efficiency, and focus on core business logic rather than boilerplate code.
Maintaining high data quality is challenging due to unclear ownership, bugs, and messy source data. By embedding continuous testing within Airflow's data workflows, teams can proactively address quality issues, ensuring data integrity and building trust with consumers while fostering shared responsibility across data engineering and business domains.
A local data platform can be built using Terraform and Docker to replicate cloud data architecture without incurring costs. This setup allows for hands-on experimentation and learning of data engineering concepts, utilizing popular open-source tools like Airflow, Minio, and DuckDB. The project emphasizes the use of infrastructure as code principles while providing a realistic environment for developing data pipelines.