Click any tag below to further narrow down your results
Links
AWS Lambda now offers improved observability for Kafka event source mappings, allowing users to monitor event polling, scaling, and processing with Amazon CloudWatch Logs and metrics. This enhancement helps troubleshoot issues quickly, reducing operational overhead and mean time to resolution. It's available for both Amazon Managed Streaming for Apache Kafka and self-managed Kafka setups.
AWS has upgraded CloudWatch to consolidate operational, security, and compliance logs from multiple accounts and sources into a unified platform. The new features support querying logs directly in Amazon S3 without ETL, making it easier for organizations to manage their log data while reducing costs and complexity. However, there are concerns about vendor lock-in as it ties users closely to the AWS ecosystem.
New Relic migrated its Lambda Extension from Go to Rust, resulting in a 40% reduction in billed duration and improved memory efficiency. The rewrite also enhanced reliability and introduced a more robust telemetry pipeline.
AWS DevOps Agent is a new tool that automates incident response by correlating data from various operational tools to identify root causes and recommend fixes. It helps on-call engineers manage incidents more efficiently and provides insights for long-term system improvements. The agent integrates with popular services like CloudWatch and GitHub to streamline investigations.
Explore the essential tools and technical guidance for enhancing observability and application performance monitoring (APM) on AWS. The article highlights free-to-try observability tools that integrate seamlessly with AWS workflows, emphasizing the importance of monitoring capabilities in Site Reliability Engineering (SRE) and offering a pay-as-you-go pricing model for scalable use.
Amazon ElastiCache now supports Valkey 8.1, introducing new features such as native Bloom filter support, enhanced hash table implementation, and the COMMANDLOG feature for improved performance and observability. These updates aim to enhance application responsiveness while reducing infrastructure costs. The new version is available at no extra cost and allows for easy upgrades without downtime.
Grafana has updated its Prometheus data source to better align with specific cloud services, deprecating AWS and Microsoft Azure authentication in favor of dedicated plugins for Amazon and Azure. This move reflects Grafana's commitment to a "big tent" philosophy, emphasizing interoperability and tailored solutions for diverse observability tools while continuing to support the open-source community.
A significant AWS outage on October 19-20, 2025, caused by a DNS failure in the DynamoDB API, led to widespread disruptions across over 140 AWS services, affecting major platforms and clients. The incident highlights the importance of observability in quickly detecting and resolving such failures, emphasizing that organizations using Full-Stack Observability can mitigate financial losses and improve response times during outages. Effective monitoring and real-time visibility into service impacts are crucial for managing risks in cloud environments.
AWS Lambda requires careful consideration for observability due to its serverless nature, which complicates monitoring and debugging. This guide explores the challenges of implementing OpenTelemetry with AWS Lambda, offers insights into instrumentation methods like AWS Distro for OpenTelemetry (ADOT) and custom SDKs, and discusses deployment options for telemetry data collection, all while emphasizing the importance of understanding the Lambda execution lifecycle.
The article discusses how to monitor agentic AI applications using Amazon CloudWatch, highlighting the importance of observability for ensuring reliability and performance. It details the setup of a sample Weather Forecaster application built with Strands Agents SDK, which utilizes CloudWatch to collect telemetry data, including metrics, traces, and logs, for comprehensive analysis. Additionally, it provides a step-by-step guide for deploying the application and analyzing the generated telemetry data in the CloudWatch console.
Zeta, a core banking technology provider, improved its incident response times by over 80% by implementing a unified observability solution using Amazon OpenSearch Service. The new system, known as Customer Service Navigator (CSN), enhances operational visibility across their multi-tenant architecture, enabling faster troubleshooting and compliance with regulatory requirements. Key features include real-time monitoring and streamlined data ingestion from various AWS services, significantly reducing mean time to resolution for incidents.