6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article discusses how Recall.ai faced delays in PostgreSQL connections during high-load meeting spikes. The issue stemmed from the single-threaded nature of the postmaster process, which struggled to handle the surge in connection requests, leading to significant latency.
If you do, here's more
Recall.ai deals with a complex workload, processing millions of meetings weekly through automated bots. A significant challenge arises from the synchronized start times of these meetings, which create sharp spikes in demand for computing resources. This puts pressure on their media processing infrastructure, leading to bottlenecks. The company investigated a specific issue where some EC2 instances experienced delays of 10-15 seconds when connecting to their PostgreSQL database, traced back to the postmaster process, which manages connections and workers.
The postmaster runs a single-threaded loop responsible for handling incoming connections. During peak times, the influx of connection requests overwhelms this loop, causing noticeable delays. Profiling the postmaster revealed that at around 1400 connections per second, it could not keep up. The team also found that enabling huge pages in Linux reduced overhead during the forking process, leading to a 20% increase in connection throughput. Background workers for parallel queries added to the stress, compounding the problem during these connection spikes.
Ultimately, the sporadic connection delays were linked to a high churn rate of background workers coinciding with the connection spikes. Monitoring data confirmed increased background worker shutdowns at the times of delays. By simulating this scenario in a controlled environment, they observed a marked decrease in connection throughput, confirming their hypothesis about the postmaster’s limitations under heavy load.
Questions about this article
No questions yet.