Click any tag below to further narrow down your results
Links
FlinkSketch offers a collection of sketching algorithms for Apache Flink's DataStream API. It includes implementations for frequency counts, distinct counts, and quantiles, allowing efficient analytics on streaming data. Users can build and run applications with custom sketches and benchmark their performance.
This article explores the differences between batch and real-time data pipelines, highlighting when each approach is appropriate. It outlines the trade-offs in terms of complexity, cost, and use-case fit, and introduces the concept of hybrid pipelines that allow flexibility in data processing.
Organizations face the challenge of integrating real-time streaming analytics with traditional batch processing in a cost-effective manner. Fresha has developed a sophisticated Data Lakehouse platform on AWS, utilizing tools like Apache Paimon and StarRocks, which combines the advantages of data lakes and data warehouses to create a scalable, secure infrastructure for analytics. Their architecture includes advanced Kubernetes orchestration and cross-account secret management, enabling efficient data operations and innovation.