BGP zombies and excessive path hunting
22 days ago
- #Cloudflare
- #Networking
- #BGP
- Cloudflare discusses BGP zombies, which are routes stuck in the Internet’s Default-Free Zone (DFZ) due to missed or lost prefix withdrawals.
- BGP zombies can disrupt traffic by causing route loops or inefficient paths, creating headaches for network operators.
- Path hunting in BGP occurs when routers search for the best path after a more-specific prefix is withdrawn, leading to delays.
- The Minimum Route Advertisement Interval (MRAI) adds delay to BGP updates, prolonging path hunting and increasing the chance of zombies.
- Cloudflare observed BGP zombies in both upstream ISP networks and internal LANs, with some routes stuck for over 10 minutes.
- IPv4 path hunting delays are worse than IPv6, likely due to the larger number of IPv4 prefixes in global routing tables.
- Cloudflare suggests improvements like graceful traffic forwarding and multi-step draining to mitigate BGP zombie impacts.
- Customers using on-demand BYOIP should announce the same-length prefix natively before withdrawing from Cloudflare to avoid zombies.
- RFC9687 (BGP SendHoldTimer) can help detect unresponsive routers, reducing long-lived zombies.
- Cloudflare encourages network operators to be cautious with more-specific prefix announcements to minimize BGP convergence issues.