Quit Emailing Yourself

# monitoring → debugging

5 links tagged with all of: monitoring + debugging

Click any tag below to further narrow down your results

Links

Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes

Pinterest encountered a significant performance issue during the migration of its search infrastructure, Manas, to Kubernetes, where one in a million search requests experienced latency spikes. The investigation revealed that cAdvisor’s memory monitoring processes were causing excessive contention, leading to these delays. The team resolved the issue by disabling a specific metric in cAdvisor, allowing them to continue their migration efforts without compromising performance.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ kubernetes debugging ✓ + performance monitoring ✓ + pinterest

[no-title]

The article discusses a memory regression issue encountered during the development of a Go application, highlighting the steps taken to identify and resolve the problem. It emphasizes the importance of monitoring memory usage and provides insights into debugging techniques used to tackle the regression effectively.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ go + memory + regression debugging ✓ monitoring ✓

AI and LLM Observability & Monitoring Solution

Sentry provides comprehensive monitoring and debugging tools for AI applications, enabling developers to quickly identify and resolve issues related to LLMs, API failures, and performance slowdowns. By offering real-time alerts and detailed visibility into agent operations, Sentry helps maintain the reliability of AI features while managing costs effectively. With easy integration and proven productivity benefits, Sentry is designed to enhance developer efficiency without sacrificing speed.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ ai monitoring ✓ debugging ✓ + performance + costs

Atla AI | The evaluation & improvement layer for AI agents

Atla is a unique evaluation tool designed for developers that not only identifies issues within agents but also provides detailed insights on how to resolve them quickly. It enables real-time monitoring, automatic clustering of failure patterns, and systematic improvements, ensuring enhanced user experiences without introducing new problems. Users can confidently deploy changes by comparing performance across different versions of their agents.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ atla monitoring ✓ debugging ✓ + error-patterns + agent-optimization

Why sharing a redis cluster across services is asking for trouble

Sharing a single Redis cache cluster across multiple services can lead to significant issues, such as key eviction affecting all services, complicating monitoring and debugging processes. While it may seem simpler initially, this approach can create confusion and performance problems as the system scales. In some cases, a shared cache is acceptable, but it's often better to maintain separate clusters for improved reliability and clarity.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ redis + caching + distributed-systems monitoring ✓ debugging ✓