Click any tag below to further narrow down your results
Links
The article critiques Cloudflare's response to a recent global outage, highlighting flaws in their root cause analysis that overlook fundamental database issues. It argues that the outage stems from a mismatch between application logic and database schema, suggesting that Cloudflare needs to focus on logical design rather than just physical replication to prevent future incidents.
On November 18, 2025, Cloudflare experienced a significant outage due to a change in database permissions that led to an oversized feature file for their Bot Management system. This caused widespread HTTP 5xx errors across various services until the issue was resolved later that day. The article details the incident, its impact, and steps for future prevention.
Cloudflare faced a global outage due to a database permission update that caused 5xx errors across its services. The issue stemmed from a regression that led to duplicate data in the Bot Management system, overwhelming memory limits and crashing the service. Cloudflare has since restored service and is reviewing its systems to prevent similar issues.