Cloudflare outage on November 18, 2025 post mortem
4 days ago
- #Cloudflare
- #Bot Management
- #Network Outage
- Cloudflare experienced a significant network failure on 18 November 2025, causing HTTP 5xx errors for users.
- The issue was triggered by a database permissions change, leading to a doubled feature file size in the Bot Management system.
- The software had a size limit for the feature file, causing it to fail when the limit was exceeded.
- Initial suspicion of a DDoS attack was incorrect; the core issue was identified and resolved by reverting to an earlier file version.
- Core traffic normalized by 14:30 UTC, with full system recovery by 17:06 UTC.
- Services impacted included Core CDN, Turnstile, Workers KV, Dashboard, Email Security, and Access.
- The Bot Management system's machine learning model was affected by duplicate feature rows in the configuration file.
- A ClickHouse database query change caused the duplicate entries, disrupting the feature file.
- Mitigation included stopping bad file propagation, manual insertion of a good file, and system restarts.
- Cloudflare outlined future hardening measures to prevent similar outages.