Click any tag below to further narrow down your results
Links
This article outlines the evolution of a data pipeline from using JSON to AVRO for Change Data Capture (CDC). As the business expanded, the limitations of JSON became apparent, leading to the adoption of AVRO, which improved performance, reduced storage costs, and streamlined schema evolution.
This article lists notable data engineering projects from late December 2025. It features a variety of pipelines and platforms, highlighting their purposes and technologies used, like Airflow, Kafka, and machine learning tools. Users can explore, vote, and share their own projects within the community.
This article explores the integration of Flink, Airflow, and StarRocks for real-time data processing. It details different methods for data ingestion, including routine loads and Kafka connectors, while sharing lessons learned from implementation. The author concludes with a preference for the Flink Connector due to its flexibility and existing infrastructure.
Building Kafka on top of S3 presents several challenges, including data consistency, latency issues, and the need for efficient data retrieval. The article explores these obstacles in depth and discusses potential solutions and architectural considerations necessary for successful integration. Understanding these challenges is crucial for engineers looking to leverage Kafka with S3 effectively.
Understanding Kafka and Flink is essential for Python data engineers as these tools are integral for handling real-time data processing and streaming. Proficiency in these technologies enhances a data engineer's capability to build robust data pipelines and manage data workflows effectively. Learning these frameworks can significantly improve job prospects and performance in data-centric roles.