Cloudflare's Resilience plan following recent outages (Code Orange)
a day ago
- #Cloudflare
- #Resilience
- #Network Outage
- Cloudflare experienced two major network outages in November and December 2025, lasting over two hours and 25 minutes respectively.
- The company has initiated 'Code Orange: Fail Small' to prioritize network resilience and prevent future outages.
- Key focus areas include controlled rollouts for configuration changes, improving failure modes, and revising 'break glass' procedures.
- Both outages were triggered by instantaneous global deployment of configuration changes without proper safeguards.
- Cloudflare plans to implement Health Mediated Deployment (HMD) for configuration changes, similar to software updates.
- The company is addressing failure modes between services to ensure graceful handling of errors.
- Efforts are underway to improve emergency response times by revising security access and removing circular dependencies.
- By Q1 2026, Cloudflare aims to have all production systems covered by HMD, updated failure modes, and improved emergency access procedures.
- The incidents highlighted the need for treating configuration changes with the same caution as software updates.