Click any tag below to further narrow down your results
Links
AWS faced a major outage on October 19-20 due to a race condition in DynamoDB’s DNS management, disrupting multiple services in the Northern Virginia region. While the incident was brief, many customers experienced issues for up to 15 hours, prompting discussions on AWS reliability and future improvements.
Amazon Route 53 Resolver now allows private access through AWS PrivateLink, enabling users to manage its features without using the public internet. This includes operations like creating and managing DNS records securely over the Amazon network. It is available in all VPCs and supports various AWS regions, including GovCloud.
AWS has introduced a feature that allows customers to make DNS changes within 60 minutes during service disruptions in its US East region. This response comes after repeated outages in the area, addressing the need for greater reliability, especially for businesses in regulated industries. However, the 60-minute recovery time still leaves room for significant service interruptions.
AWS introduced Accelerated Recovery for Route 53, allowing DNS changes within 60 minutes during service disruptions in the US East (N. Virginia) region. This feature helps businesses maintain continuity by enabling critical DNS management even when facing outages. Users can easily enable it through the AWS Management Console without changing existing setups.
A DNS race condition in Amazon's DynamoDB system caused a significant outage that disrupted major websites and services, resulting in potential damages reaching hundreds of billions of dollars. The issue stemmed from a failure in the automated DNS management system, leading to widespread DNS failures and affecting various AWS services. Amazon has since disabled the affected systems and is working to implement safeguards against a recurrence.
Amazon Route 53 Resolver endpoints now support DNS delegation for private hosted zones, allowing users to delegate authority for subdomains between on-premises infrastructure and the cloud. This simplifies DNS management for organizations by removing the need for complex conditional forwarding rules. The feature is globally available at no additional cost in supported AWS regions.
AWS experienced a significant outage on October 20, primarily due to DNS issues linked to the departure of senior engineers, leading to concerns about the company's diminishing institutional knowledge. As a result, many internet services were disrupted, highlighting the potential consequences of a talent drain within AWS. The situation raises questions about the company's ability to handle future incidents with a less experienced workforce.
Amazon Web Services experienced a significant outage on Monday, affecting numerous major websites including Disney+, Reddit, and United Airlines. Although most services were restored within hours, the outage highlighted the fragility of reliance on major cloud providers, with AWS confirming it was caused by DNS issues related to its DynamoDB service.
Amazon's AWS experienced a significant outage due to a major DNS failure linked to a race condition within DynamoDB's infrastructure, affecting users globally for over 14 hours. The incident led to the accidental deletion of all IP addresses for the database service's regional endpoint, causing widespread connectivity issues. In response, Amazon has implemented measures to prevent future occurrences and apologized for the disruption caused to customers.
The article discusses a significant 14-hour outage of AWS's us-east-1 region, which affected 140 services including EC2, due to a latent race condition in the DynamoDB DNS management system. The author analyzes the outage's causes and emphasizes the complexity and critical nature of AWS's infrastructure, suggesting that oversimplified explanations do not capture the depth of the incident.