6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
New Relic developed Weather Station, an internal system that performs over 100,000 connectivity checks per hour across its multi-cloud infrastructure. This tool allows for rapid detection and diagnosis of network issues by continuously validating network paths, significantly improving the speed of issue detection and resolution.
If you do, here's more
New Relic operates a global observability platform that requires constant uptime, making network outages particularly dangerous. To monitor its extensive multi-cloud infrastructure, which includes hundreds of clusters and a complex network of connections, New Relic developed an internal monitoring system called Weather Station. This system performs over 100,000 connectivity checks per hour, ensuring that network paths remain operational across regions and cloud providers.
The challenge wasn't just detecting outages but quickly diagnosing specific connectivity issues. Engineers previously spent significant time manually checking connections and analyzing timestamps to identify failures. Weather Station automates this process by continuously validating critical network paths, providing immediate context when a failure occurs. It uses a dedicated monitoring network that mirrors production topology, ensuring that tests reflect real network conditions.
Weather Station employs a hierarchical configuration system to adapt checks based on the instance's location and environment. It runs four types of checks: regional, inter-regional, cross-cloud, and management network checks. Each check primarily uses ICMP ping tests and TCP port checks, allowing for comprehensive monitoring of network health. The deployment of Weather Station led to a 90% reduction in mean time to detect issues and a 50% improvement in the mean time to resolve incidents, significantly enhancing New Relic's operational efficiency.
Questions about this article
No questions yet.