5 links
tagged with all of: monitoring + debugging
Click any tag below to further narrow down your results
Links
Pinterest encountered a significant performance issue during the migration of its search infrastructure, Manas, to Kubernetes, where one in a million search requests experienced latency spikes. The investigation revealed that cAdvisor’s memory monitoring processes were causing excessive contention, leading to these delays. The team resolved the issue by disabling a specific metric in cAdvisor, allowing them to continue their migration efforts without compromising performance.
The article discusses a memory regression issue encountered during the development of a Go application, highlighting the steps taken to identify and resolve the problem. It emphasizes the importance of monitoring memory usage and provides insights into debugging techniques used to tackle the regression effectively.
Sentry provides comprehensive monitoring and debugging tools for AI applications, enabling developers to quickly identify and resolve issues related to LLMs, API failures, and performance slowdowns. By offering real-time alerts and detailed visibility into agent operations, Sentry helps maintain the reliability of AI features while managing costs effectively. With easy integration and proven productivity benefits, Sentry is designed to enhance developer efficiency without sacrificing speed.
Atla is a unique evaluation tool designed for developers that not only identifies issues within agents but also provides detailed insights on how to resolve them quickly. It enables real-time monitoring, automatic clustering of failure patterns, and systematic improvements, ensuring enhanced user experiences without introducing new problems. Users can confidently deploy changes by comparing performance across different versions of their agents.
Sharing a single Redis cache cluster across multiple services can lead to significant issues, such as key eviction affecting all services, complicating monitoring and debugging processes. While it may seem simpler initially, this approach can create confusion and performance problems as the system scales. In some cases, a shared cache is acceptable, but it's often better to maintain separate clusters for improved reliability and clarity.