3 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Cloudflare faced a global outage due to a database permission update that caused 5xx errors across its services. The issue stemmed from a regression that led to duplicate data in the Bot Management system, overwhelming memory limits and crashing the service. Cloudflare has since restored service and is reviewing its systems to prevent similar issues.
If you do, here's more
On November 18, 2025, Cloudflare faced a global outage due to a database permission update, resulting in widespread 5xx errors across its services. The disruption began at 11:20 UTC and affected customer access to websites, even locking Cloudflare's own team out of their internal systems. CEO Matthew Prince explained that the issue stemmed from a regression in their ClickHouse database cluster. A change intended to enhance security inadvertently caused the Bot Management system to malfunction by altering the way metadata queries returned data, leading to an oversized configuration file that exceeded memory limits and crashed the module.
The engineers struggled to pinpoint the problem because the database updates were rolled out gradually, causing sporadic shifts between functional and non-functional states. This led the team to initially suspect a large-scale DDoS attack rather than an internal bug. Compounding the confusion, Cloudflare's external status page went down, giving the impression that the issue was more widespread. The outage was significant, being the companyβs worst since 2019, and highlighted how reliant many websites are on Cloudflare's infrastructure.
In the aftermath, industry voices like Dicky Wong, CEO of Syber Couture, emphasized the importance of multi-vendor strategies to avoid single points of failure. He likened the situation to a marriage without a prenup, suggesting that businesses need to adopt more resilient approaches. Other professionals echoed this sentiment, arguing against the trend of relying on a single vendor, which can lead to vulnerabilities during outages. By 14:30 UTC, service was restored after Cloudflare manually pushed a stable configuration, and the company announced plans to reassess its memory management practices to better handle similar issues in the future.
Questions about this article
No questions yet.