Hasty Briefsbeta

Cloudflare incident on August 21, 2025

2 days ago
  • #Network Congestion
  • #Cloudflare
  • #AWS
  • On August 21, 2025, a traffic surge from a single customer in AWS us-east-1 caused severe congestion between Cloudflare and AWS us-east-1, leading to high latency, packet loss, and connection failures.
  • The incident started at 16:27 UTC and was mostly resolved by 19:38 UTC, with intermittent issues until 20:18 UTC.
  • The congestion was due to a traffic surge that overloaded Cloudflare's links with AWS us-east-1, exacerbated by AWS withdrawing BGP advertisements to alleviate congestion.
  • Cloudflare's internal network capacity was insufficient for this surge, partly due to a pre-existing half-capacity link and a pending DCI upgrade.
  • Cloudflare and AWS collaborated to mitigate the issue, including rate-limiting the problematic customer and adjusting BGP advertisements.
  • The incident highlighted the need for better customer isolation and network capacity to prevent similar issues in the future.
  • Short-term solutions include deprioritizing traffic from customers causing congestion and expediting DCI upgrades.
  • Long-term solutions involve building a new traffic management system to allocate network resources per customer and automate congestion responses.
  • Cloudflare apologized for the disruption and is implementing improvements to prevent recurrence.