6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Cloudflare experienced significant network failures in November and December 2025, prompting them to launch a "Code Orange: Fail Small" initiative. This plan focuses on improving the resilience of their network by implementing controlled rollouts for configuration changes, enhancing failure handling, and streamlining emergency response processes.
If you do, here's more
On November 18, 2025, Cloudflare's network faced a major outage lasting over two hours, followed by another incident on December 5 that affected 28% of applications for about 25 minutes. In response, Cloudflare initiated a resilience plan named "Code Orange: Fail Small." This plan aims to enhance the network's robustness against errors that could lead to significant outages. The company prioritized this effort to regain customer trust and prevent future incidents.
The plan focuses on three key areas: implementing controlled rollouts for configuration changes, improving failure modes for systems handling network traffic, and revising internal protocols for urgent actions. Currently, Cloudflare deploys configuration changes instantaneously, which has led to network failures. In both incidents, a misconfiguration rolled out rapidly, causing widespread issues. The company recognizes that changes to behavior should be treated with the same caution as software updates, which undergo rigorous testing and controlled deployment.
To address these problems, Cloudflare is adopting a Health Mediated Deployment (HMD) system for configuration updates. This system will ensure that any changes are monitored closely, allowing for automatic rollbacks if issues arise. The goal is to catch potential failures before they affect the wider network. Cloudflare is also considering how failures in one service can impact others, thereby seeking to contain issues and maintain overall stability.
Questions about this article
No questions yet.