Hasty Briefsbeta

Cloudflare outage on November 18, 2025 post mortem

4 days ago
  • #Cloudflare
  • #Bot Management
  • #Network Outage
  • Cloudflare experienced a significant network failure on 18 November 2025, causing HTTP 5xx errors for users.
  • The issue was triggered by a database permissions change, leading to a doubled feature file size in the Bot Management system.
  • The software had a size limit for the feature file, causing it to fail when the limit was exceeded.
  • Initial suspicion of a DDoS attack was incorrect; the core issue was identified and resolved by reverting to an earlier file version.
  • Core traffic normalized by 14:30 UTC, with full system recovery by 17:06 UTC.
  • Services impacted included Core CDN, Turnstile, Workers KV, Dashboard, Email Security, and Access.
  • The Bot Management system's machine learning model was affected by duplicate feature rows in the configuration file.
  • A ClickHouse database query change caused the duplicate entries, disrupting the feature file.
  • Mitigation included stopping bad file propagation, manual insertion of a good file, and system restarts.
  • Cloudflare outlined future hardening measures to prevent similar outages.