Fixing DNS tail latency with a 5-line config and a 50-line function
a day ago
- #request hedging
- #DNS latency
- #HTTP/2 optimization
- Hyper's default HTTP/2 flow control windows (2 MB stream, 5 MB connection) are oversized for small payloads, causing tail latency; setting them to 64 KB improved median latency from 13.3ms to 10.1ms.
- Request hedging, inspired by 'The Tail at Scale,' involves sending duplicate requests after a delay to mitigate random spikes; this reduced p95 latency by 45% and p99 by 37%.
- Tail latency in hyper is due to an mpsc dispatch channel that causes periodic stalls, not runtime issues, as confirmed by instrumentation and comparison with hickory-resolver.
- For recursive DNS resolution, fixes like caching NS delegations, serve-stale, reduced timeouts, and parallel queries cut cold p99 from 2.3 seconds to 538ms, beating Unbound in some metrics.
- Hedging is effective for HTTP/2 clients with random latency spikes but can harm performance on consistently degraded networks, as it reduces variance rather than latency.