How OpenAI delivers low-latency voice AI at scale
3 hours ago
- #Real-time AI
- #WebRTC
- #OpenAI
- OpenAI rearchitected its WebRTC stack for low-latency voice AI at scale.
- Key requirements include global reach for 900+ million weekly active users, fast connection setup, and low stable media round-trip time.
- The architecture uses a split relay plus transceiver model to avoid one-port-per-session issues and preserve standard WebRTC behavior.
- The relay handles UDP packet forwarding with a small public footprint, while the transceiver owns full WebRTC session state.
- Routing is based on the ICE username fragment (ufrag) for deterministic first-packet routing without external dependencies.
- Global Relay provides geographically distributed ingress points to reduce first-hop latency and improve user experience.
- The implementation in Go uses efficient techniques like SO_REUSEPORT and thread pinning without kernel bypass.
- This design enables Kubernetes scalability, simplifies security, and maintains client interoperability.