6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article details performance improvements in Apache Hudi 1.1 for streaming ingestion when integrated with Apache Flink. Key optimizations include better serialization, new Flink-native writers, and reduced memory overhead, leading to significant gains in ingestion throughput.
If you do, here's more
Apache Hudi 1.1 introduces significant optimizations for streaming ingestion when paired with Apache Flink. As data volumes grow, maintaining high performance in real-time ingestion becomes increasingly difficult. Hudi 1.1 addresses issues like network shuffle overhead, serialization costs, and garbage collection challenges that arise from in-memory buffer management. The enhancements focus on improving throughput and reducing resource consumption during streaming jobs.
One key change is the introduction of a new serialization format. Previously, data passed through multiple conversions—from Flink's RowData to Avro records—before reaching Hudi. This process led to unnecessary overhead. The new approach utilizes a custom data structure, HoodieFlinkInternalRow, which directly incorporates RowData and necessary metadata. This shift increases streaming ingestion throughput by about 25%. Additionally, Hudi now employs a Flink-native writer that eliminates redundant conversions, further cutting down on memory usage and GC pressure.
Hudi 1.1 also optimizes log file writing for Merge-On-Read (MOR) tables. Instead of creating multiple temporary byte arrays during serialization, the new method directly writes data bytes to the output stream. This change reduces both latency and garbage collection issues. Performance benchmarks comparing Hudi 1.1 with Hudi 1.0 and Apache Paimon 1.0.1 show concrete improvements in ingestion speed, addressing previous claims about Paimon’s superior performance. The tests were conducted in a controlled environment on an Alibaba Cloud EMR cluster, ensuring reproducibility and transparency in results.
Questions about this article
No questions yet.