7 links
tagged with all of: aws + observability
Click any tag below to further narrow down your results
Links
Explore the essential tools and technical guidance for enhancing observability and application performance monitoring (APM) on AWS. The article highlights free-to-try observability tools that integrate seamlessly with AWS workflows, emphasizing the importance of monitoring capabilities in Site Reliability Engineering (SRE) and offering a pay-as-you-go pricing model for scalable use.
Amazon ElastiCache now supports Valkey 8.1, introducing new features such as native Bloom filter support, enhanced hash table implementation, and the COMMANDLOG feature for improved performance and observability. These updates aim to enhance application responsiveness while reducing infrastructure costs. The new version is available at no extra cost and allows for easy upgrades without downtime.
Grafana has updated its Prometheus data source to better align with specific cloud services, deprecating AWS and Microsoft Azure authentication in favor of dedicated plugins for Amazon and Azure. This move reflects Grafana's commitment to a "big tent" philosophy, emphasizing interoperability and tailored solutions for diverse observability tools while continuing to support the open-source community.
A significant AWS outage on October 19-20, 2025, caused by a DNS failure in the DynamoDB API, led to widespread disruptions across over 140 AWS services, affecting major platforms and clients. The incident highlights the importance of observability in quickly detecting and resolving such failures, emphasizing that organizations using Full-Stack Observability can mitigate financial losses and improve response times during outages. Effective monitoring and real-time visibility into service impacts are crucial for managing risks in cloud environments.
AWS Lambda requires careful consideration for observability due to its serverless nature, which complicates monitoring and debugging. This guide explores the challenges of implementing OpenTelemetry with AWS Lambda, offers insights into instrumentation methods like AWS Distro for OpenTelemetry (ADOT) and custom SDKs, and discusses deployment options for telemetry data collection, all while emphasizing the importance of understanding the Lambda execution lifecycle.
The article discusses how to monitor agentic AI applications using Amazon CloudWatch, highlighting the importance of observability for ensuring reliability and performance. It details the setup of a sample Weather Forecaster application built with Strands Agents SDK, which utilizes CloudWatch to collect telemetry data, including metrics, traces, and logs, for comprehensive analysis. Additionally, it provides a step-by-step guide for deploying the application and analyzing the generated telemetry data in the CloudWatch console.
Zeta, a core banking technology provider, improved its incident response times by over 80% by implementing a unified observability solution using Amazon OpenSearch Service. The new system, known as Customer Service Navigator (CSN), enhances operational visibility across their multi-tenant architecture, enabling faster troubleshooting and compliance with regulatory requirements. Key features include real-time monitoring and streamlined data ingestion from various AWS services, significantly reducing mean time to resolution for incidents.