Cloudflare incident on August 21, 2025

2 days ago

Copy Link

On August 21, 2025, a traffic surge from a single customer in AWS us-east-1 caused severe congestion between Cloudflare and AWS us-east-1, leading to high latency, packet loss, and connection failures.
The incident started at 16:27 UTC and was mostly resolved by 19:38 UTC, with intermittent issues until 20:18 UTC.
The congestion was due to a traffic surge that overloaded Cloudflare's links with AWS us-east-1, exacerbated by AWS withdrawing BGP advertisements to alleviate congestion.
Cloudflare's internal network capacity was insufficient for this surge, partly due to a pre-existing half-capacity link and a pending DCI upgrade.
Cloudflare and AWS collaborated to mitigate the issue, including rate-limiting the problematic customer and adjusting BGP advertisements.
The incident highlighted the need for better customer isolation and network capacity to prevent similar issues in the future.
Short-term solutions include deprioritizing traffic from customers causing congestion and expediting DCI upgrades.
Long-term solutions involve building a new traffic management system to allocate network resources per customer and automate congestion responses.
Cloudflare apologized for the disruption and is implementing improvements to prevent recurrence.

Hasty Briefsbeta