5 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article details a real-world case of a goroutine leak in a Go application that grew silently over six weeks, impacting memory usage and response time. It outlines the symptoms, code issues, and the steps taken to identify and fix the leak, including the use of Uber's LeakProf tool.
If you do, here's more
The article details a significant goroutine leak in a Go-based API service that escalated over six weeks, ultimately leading to excessive memory usage and slow response times. By the end of the leak, the service was running 50,847 goroutines, consuming 47GB of memory, and taking 32 seconds to respond. The initial symptoms, including sluggishness and increased timeouts, went largely unnoticed until they culminated in a crisis at 3 AM on a Saturday. The author emphasizes the importance of monitoring goroutine counts and using `context.Context` for managing goroutine lifecycles to prevent such issues.
The leak was traced back to the WebSocket notification system, where the code appeared correct at first glance. However, several bugs contributed to the problem. The primary issue was that the cancellation function for the context, which should have been called when a WebSocket connection closed, was never invoked. Additionally, the heartbeat ticker was not stopped, leading to unnecessary memory retention, particularly in Go versions prior to 1.23. Lastly, the messages channel continued to receive data without being closed, causing further memory growth.
To diagnose the problem, the team used Uber's LeakProf tool, which quickly revealed lingering goroutines after the WebSocket connections were closed. Through careful analysis of goroutine dumps and active connections, they identified that many subscriptions were kept for inactive users. The resolution involved adding cleanup handlers, explicitly stopping goroutines, and closing channels when connections were terminated. This proactive approach not only fixed the immediate crisis but also established better practices for managing goroutine lifecycles in the future.
Questions about this article
No questions yet.