4 links
tagged with all of: big-data + data-engineering
Click any tag below to further narrow down your results
Links
The article introduces Apache Spark 4.0, highlighting its new features, performance improvements, and enhancements aimed at simplifying data processing tasks. It emphasizes the importance of this release for developers and data engineers seeking to leverage Spark's capabilities for big data analytics and machine learning applications.
The article introduces PyIceberg, a tool designed to help data engineers manage and query large datasets efficiently. It emphasizes the importance of handling data in motion and how PyIceberg integrates with modern data infrastructure to streamline processes. Key features and use cases are highlighted to showcase its effectiveness in data engineering workflows.
Netflix has developed a Real-Time Distributed Graph (RDG) to address the complexities arising from their evolving business model, which includes streaming, ads, and gaming. The first part of this series details the architecture and ingestion pipeline that processes vast amounts of data to facilitate quick querying and insights.
Tulika Bhatt, a senior software engineer at Netflix, discusses her experiences with large-scale data processing and the challenges of managing impression data for personalization. She emphasizes the need for a balance between off-the-shelf solutions and custom-built systems while highlighting the complexities of ensuring data quality and observability in high-speed environments. The conversation also touches on the future of data engineering technologies and the impact of generative AI on data management practices.