23 links
tagged with outage
Click any tag below to further narrow down your results
Links
The article discusses a significant service outage that occurred at Cloudflare on June 12, 2025, affecting numerous websites and services globally. It details the causes of the outage, including technical failures and their impact on users and businesses. Additionally, the company outlines measures taken to prevent similar incidents in the future.
Linktree has mysteriously gone dark in India, leaving users and the company puzzled about the reasons behind this sudden service disruption. Despite attempts to understand the situation, Linktree has not provided a clear explanation for the outage.
A DNS race condition in Amazon's DynamoDB system caused a significant outage that disrupted major websites and services, resulting in potential damages reaching hundreds of billions of dollars. The issue stemmed from a failure in the automated DNS management system, leading to widespread DNS failures and affecting various AWS services. Amazon has since disabled the affected systems and is working to implement safeguards against a recurrence.
An AWS outage caused significant disruptions to various popular services, including Alexa, Fortnite, and Snapchat, leaving many users unable to access these platforms. The incident highlights the reliance on cloud services and the potential impact of downtime on everyday activities and businesses.
Microsoft is investigating a global outage affecting access to the Exchange Admin Center, designated as a critical service issue. Administrators are encountering "HTTP Error 500" when trying to log in, but some have found a workaround via a different URL. Microsoft is working on solutions and has started redirecting traffic to restore access temporarily.
A significant AWS outage on October 19-20, 2025, caused by a DNS failure in the DynamoDB API, led to widespread disruptions across over 140 AWS services, affecting major platforms and clients. The incident highlights the importance of observability in quickly detecting and resolving such failures, emphasizing that organizations using Full-Stack Observability can mitigate financial losses and improve response times during outages. Effective monitoring and real-time visibility into service impacts are crucial for managing risks in cloud environments.
Microsoft is addressing an outage affecting its Azure Front Door CDN, which has disrupted access to various Microsoft 365 services across Europe, Africa, and the Middle East. As of the latest updates, the company has restored approximately 98% of the service and is actively monitoring for full recovery, with the outage affecting only about 4% of previously impacted customers. The incident has been officially mitigated, and users have reported resolution of access issues.
Amazon's cloud service, AWS, experienced a significant outage affecting numerous popular websites and applications, including Snapchat and Reddit. While services have returned to normal, a backlog of messages is still being processed, highlighting the vulnerabilities in the reliance on a few major cloud providers.
Cellcom has confirmed that a week-long service disruption affecting voice and text services in Wisconsin and Upper Michigan was caused by a cyberattack. The company is working with cybersecurity experts to investigate the incident, and while some services are being restored, there is no evidence that customer data was compromised.
A massive outage at Amazon Web Services (AWS) on October 20, 2025, caused widespread disruptions to various internet services globally, affecting numerous businesses and users. The incident highlighted the reliance on cloud services and raised concerns over their stability and resilience. Users experienced significant interruptions, leading to discussions about the implications for digital infrastructure.
Amazon Web Services experienced a significant outage on Monday, affecting numerous major websites including Disney+, Reddit, and United Airlines. Although most services were restored within hours, the outage highlighted the fragility of reliance on major cloud providers, with AWS confirming it was caused by DNS issues related to its DynamoDB service.
AWS experienced a significant outage on October 20, primarily due to DNS issues linked to the departure of senior engineers, leading to concerns about the company's diminishing institutional knowledge. As a result, many internet services were disrupted, highlighting the potential consequences of a talent drain within AWS. The situation raises questions about the company's ability to handle future incidents with a less experienced workforce.
A significant incident occurred on July 14, 2025, involving Cloudflare's 1.1.1.1 DNS service, leading to widespread internet disruptions. The article details the nature of the incident, its impact on users, and the steps taken by Cloudflare to resolve the issues.
The article discusses an outage affecting services provided by GCP (Google Cloud Platform), Cloudflare, and Anthropic, highlighting the implications for users and businesses reliant on these platforms. It examines the causes of the outage and its impact on cloud computing reliability and security.
The article discusses the recent Google Cloud outage, detailing its causes, effects on businesses and users, and the broader implications for cloud reliability. It emphasizes the consequences of such disruptions on critical operations and highlights the need for better contingency planning in cloud services.
On April 16, 2025, Spotify experienced a global outage due to a bug triggered by a change in the order of Envoy Proxy filters, leading to simultaneous crashes of all Envoy instances. The incident caused a significant traffic disruption, except in the Asia Pacific region, and was eventually mitigated by increasing server capacity and addressing configuration issues. Spotify has outlined steps to prevent similar outages in the future, including bug fixes and improvements in their rollout and monitoring processes.
Cloudflare experienced a significant outage on September 12, 2023, affecting both their dashboard and API services. The incident caused disruptions for users relying on these tools, leading to increased scrutiny of the company's infrastructure and response mechanisms during downtime. Cloudflare's team worked to resolve the issues and restore services as quickly as possible.
A single software bug in Amazon's DynamoDB DNS management system caused a significant outage of Amazon Web Services, affecting millions globally for over 15 hours. The failure stemmed from a race condition triggered by the interaction of two components within the system, which led to widespread service disruptions reported by thousands of organizations.
Amazon Web Services resolved a significant outage that affected over 1,000 apps and websites, including Snapchat and major banks, highlighting the risks of relying heavily on a single cloud provider. Experts emphasized the need for companies to build more resilient systems and questioned the sustainability of the current concentration of cloud services among a few major players. The outage, attributed to DNS resolution issues, sparked discussions on the vulnerabilities in the infrastructure of online services.
Amazon's AWS experienced a significant outage due to a major DNS failure linked to a race condition within DynamoDB's infrastructure, affecting users globally for over 14 hours. The incident led to the accidental deletion of all IP addresses for the database service's regional endpoint, causing widespread connectivity issues. In response, Amazon has implemented measures to prevent future occurrences and apologized for the disruption caused to customers.
The article critiques popular misconceptions surrounding the recent AWS outage, emphasizing that it was not caused by AI and highlighting the pitfalls of adopting a multi-cloud strategy. It discusses the complexities of maintaining cloud systems and the importance of understanding the root causes of outages rather than relying on simplistic explanations or excuses.
An outage at Amazon Web Services left users of Eight Sleep's smart mattresses unable to access temperature controls, resulting in uncomfortable nights. Customers reported waking up distressed as they lost access to the app that regulates their sleep environment. The incident highlighted the vulnerabilities of smart home technology reliant on cloud services.
The article discusses a significant 14-hour outage in the AWS us-east-1 region that affected 140 services, primarily due to a race condition in the DynamoDB DNS management system. The author analyzes the outage's causes and implications, emphasizing the interconnectedness of AWS services and the unexpected nature of such failures in a highly reliable cloud platform.