Hasty Briefsbeta

BGP zombies and excessive path hunting

22 days ago
  • #Cloudflare
  • #Networking
  • #BGP
  • Cloudflare discusses BGP zombies, which are routes stuck in the Internet’s Default-Free Zone (DFZ) due to missed or lost prefix withdrawals.
  • BGP zombies can disrupt traffic by causing route loops or inefficient paths, creating headaches for network operators.
  • Path hunting in BGP occurs when routers search for the best path after a more-specific prefix is withdrawn, leading to delays.
  • The Minimum Route Advertisement Interval (MRAI) adds delay to BGP updates, prolonging path hunting and increasing the chance of zombies.
  • Cloudflare observed BGP zombies in both upstream ISP networks and internal LANs, with some routes stuck for over 10 minutes.
  • IPv4 path hunting delays are worse than IPv6, likely due to the larger number of IPv4 prefixes in global routing tables.
  • Cloudflare suggests improvements like graceful traffic forwarding and multi-step draining to mitigate BGP zombie impacts.
  • Customers using on-demand BYOIP should announce the same-length prefix natively before withdrawing from Cloudflare to avoid zombies.
  • RFC9687 (BGP SendHoldTimer) can help detect unresponsive routers, reducing long-lived zombies.
  • Cloudflare encourages network operators to be cautious with more-specific prefix announcements to minimize BGP convergence issues.