Quit Emailing Yourself

How when AWS was down, we were not | Authress - Knowledge Base

7 min read | Saved February 14, 2026 | Copied!

aws 🤖 reliability 🤖 outages 🤖 authentication 🤖 architecture 🤖

Do you care about this?

This article explains how Authress maintained service availability despite the significant AWS outage on October 20th. It discusses the importance of reliability in their authentication services and the architectural strategies they implemented to achieve a five-nines SLA.

If you do, here's more

On October 20th, a major AWS outage in the us-east-1 region severely impacted services, including DynamoDB, which faced increased error rates. This incident affected high-profile companies like Disney+, Lyft, and Reddit. The author emphasizes that while their infrastructure relies on AWS, they must support customers who choose to operate in us-east-1, despite its vulnerabilities. Acknowledging the critical nature of their service, which provides authentication and access control, the author details the importance of maintaining high reliability, aiming for a five nines SLA, equating to just over five minutes of downtime per year.

The article delves into the challenges of achieving such reliability within a cloud infrastructure that experiences frequent outages. Historical incidents, including hardware failures, human errors, and natural disasters, highlight the inherent risks of relying solely on AWS services. The author argues against depending on AWS’s own SLAs, which fall short of five nines in several key services like Lambda and API Gateway. The core message is clear: to meet strict reliability commitments, they must build resilience into their architecture, rather than relying on the cloud provider's assurances.

Questions about this article

No questions yet.