4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
The article discusses the shortcomings of the OpenTelemetry batch processor, particularly during Collector restarts, leading to data loss. It advocates for using exporter-level batching with persistent storage for better reliability and durability in production environments.
If you do, here's more
The OpenTelemetry batch processor has been a popular component in data pipelines, but its limitations during restarts and failures have led the community to move away from it. When the Collector restarts, telemetry buffered in memory can be lost without generating alerts, resulting in data gaps. For example, if an application sends 100 traces and the Collector restarts, using the batch processor can lead to 100% data loss. This occurs because the batch processor acknowledges receipt of data before it’s actually exported, creating an "at-most-once" delivery model.
In contrast, exporter-level batching integrates batching and queuing directly into the exporter, using persistent storage. This design ensures that telemetry is safely queued before sending any acknowledgments back to the sender. In testing, when a Collector using exporter-level batching was restarted after sending 100 traces, all traces were recovered. This approach shifts the model closer to "at-least-once" delivery, significantly improving data durability.
The article emphasizes that the newer exporter-level batching simplifies the architecture by eliminating the need for separate in-memory buffers. It also allows for better management of backpressure. If storage fills up or the backend is saturated, that information can flow back through the pipeline, preventing data overload. While this new method may not be a universal fix for all scenarios, it aligns better with the reliability needs of many production environments. Teams should consider these differences carefully when configuring their OpenTelemetry Collectors.
Questions about this article
No questions yet.