Quit Emailing Yourself

Race Condition in DynamoDB DNS System: Analyzing the AWS US-EAST-1 Outage

AWS faced a major outage on October 19-20 due to a race condition in DynamoDB’s DNS management, disrupting multiple services in the Northern Virginia region. While the incident was brief, many customers experienced issues for up to 15 hours, prompting discussions on AWS reliability and future improvements.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

aws ✓ + dynamodb outage ✓ + reliability + dns

Entropic Thoughts

This article examines a recent AWS DynamoDB outage caused by a latent race condition in the DNS management system. It discusses how applying System-Theoretic Process Analysis (STPA) could have identified potential issues before the outage occurred, highlighting the importance of proactive analysis in software reliability.

Saved by tldr-importer · Last saved February 14, 2026 · 7 min read

aws ✓ + dynamodb outage ✓ + stpa + analysis

A single DNS race condition brought AWS to its knees

A DNS race condition in Amazon's DynamoDB system caused a significant outage that disrupted major websites and services, resulting in potential damages reaching hundreds of billions of dollars. The issue stemmed from a failure in the automated DNS management system, leading to widespread DNS failures and affecting various AWS services. Amazon has since disabled the affected systems and is working to implement safeguards against a recurrence.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ amazon aws ✓ outage ✓ + dynamodb + dns

[no-title]

An AWS outage caused significant disruptions to various popular services, including Alexa, Fortnite, and Snapchat, leaving many users unable to access these platforms. The incident highlights the reliance on cloud services and the potential impact of downtime on everyday activities and businesses.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

aws ✓ outage ✓ + alexa + fortnite + snapchat

AWS Outage And Why O11y is Non Negotiable

A significant AWS outage on October 19-20, 2025, caused by a DNS failure in the DynamoDB API, led to widespread disruptions across over 140 AWS services, affecting major platforms and clients. The incident highlights the importance of observability in quickly detecting and resolving such failures, emphasizing that organizations using Full-Stack Observability can mitigate financial losses and improve response times during outages. Effective monitoring and real-time visibility into service impacts are crucial for managing risks in cloud environments.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

aws ✓ outage ✓ + observability + cloud + monitoring

Amazon says AWS cloud service back to normal after outage disrupts businesses worldwide | Reuters

Amazon's cloud service, AWS, experienced a significant outage affecting numerous popular websites and applications, including Snapchat and Reddit. While services have returned to normal, a backlog of messages is still being processed, highlighting the vulnerabilities in the reliance on a few major cloud providers.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

aws ✓ outage ✓ + cloud + services + disruption

Amazon brain drain finally caught up with AWS

AWS experienced a significant outage on October 20, primarily due to DNS issues linked to the departure of senior engineers, leading to concerns about the company's diminishing institutional knowledge. As a result, many internet services were disrupted, highlighting the potential consequences of a talent drain within AWS. The situation raises questions about the company's ability to handle future incidents with a less experienced workforce.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

aws ✓ outage ✓ + dns + brain-drain + institutional-knowledge

[no-title]

A massive outage at Amazon Web Services (AWS) on October 20, 2025, caused widespread disruptions to various internet services globally, affecting numerous businesses and users. The incident highlighted the reliance on cloud services and raised concerns over their stability and resilience. Users experienced significant interruptions, leading to discussions about the implications for digital infrastructure.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

aws ✓ outage ✓ + internet + services + cloud

AWS services recover after daylong outage hits major sites

Amazon Web Services experienced a significant outage on Monday, affecting numerous major websites including Disney+, Reddit, and United Airlines. Although most services were restored within hours, the outage highlighted the fragility of reliance on major cloud providers, with AWS confirming it was caused by DNS issues related to its DynamoDB service.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

aws ✓ outage ✓ + cloud-computing + dns + infrastructure

Amazon: This week’s AWS outage caused by major DNS failure

Amazon's AWS experienced a significant outage due to a major DNS failure linked to a race condition within DynamoDB's infrastructure, affecting users globally for over 14 hours. The incident led to the accidental deletion of all IP addresses for the database service's regional endpoint, causing widespread connectivity issues. In response, Amazon has implemented measures to prevent future occurrences and apologized for the disruption caused to customers.

Saved by hn_user_14 · Last saved October 28, 2025 · 3 min read

aws ✓ outage ✓ + dns

AWS outage: Myths vs reality • The Register

The article critiques popular misconceptions surrounding the recent AWS outage, emphasizing that it was not caused by AI and highlighting the pitfalls of adopting a multi-cloud strategy. It discusses the complexities of maintaining cloud systems and the importance of understanding the root causes of outages rather than relying on simplistic explanations or excuses.

Saved by hn_user_12 · Last saved October 28, 2025 · 3 min read

aws ✓ outage ✓ + cloud computing

AWS Cloud-Computing Outage Left Smart Bed Customers Without Sleep - The New York Times

An outage at Amazon Web Services left users of Eight Sleep's smart mattresses unable to access temperature controls, resulting in uncomfortable nights. Customers reported waking up distressed as they lost access to the app that regulates their sleep environment. The incident highlighted the vulnerabilities of smart home technology reliant on cloud services.

Saved by hn_user_9 · 2 others saved this · Last saved October 28, 2025 · 2 min read

aws ✓ outage ✓ + smart-beds + eight-sleep + technology

More Than DNS: The 14 hour AWS us-east-1 outage – Jonathon Belotti [thundergolfer]

The article discusses a significant 14-hour outage in the AWS us-east-1 region that affected 140 services, primarily due to a race condition in the DynamoDB DNS management system. The author analyzes the outage's causes and implications, emphasizing the interconnectedness of AWS services and the unexpected nature of such failures in a highly reliable cloud platform.

Saved by hn_user_7 · 2 others saved this · Last saved October 28, 2025 · 3 min read

aws ✓ outage ✓ + dynamodb + dns

Links