How OpenAI delivers low-latency voice AI at scale

5 hours ago

OpenAI rearchitected its WebRTC stack for low-latency voice AI at scale.
Key requirements include global reach for 900+ million weekly active users, fast connection setup, and low stable media round-trip time.
The architecture uses a split relay plus transceiver model to avoid one-port-per-session issues and preserve standard WebRTC behavior.
The relay handles UDP packet forwarding with a small public footprint, while the transceiver owns full WebRTC session state.
Routing is based on the ICE username fragment (ufrag) for deterministic first-packet routing without external dependencies.
Global Relay provides geographically distributed ingress points to reduce first-hop latency and improve user experience.
The implementation in Go uses efficient techniques like SO_REUSEPORT and thread pinning without kernel bypass.
This design enables Kubernetes scalability, simplifies security, and maintains client interoperability.

Hasty Briefsbeta