Hasty Briefsbeta

Bilingual

How OpenAI delivers low-latency voice AI at scale

5 hours ago
  • #Real-time AI
  • #WebRTC
  • #OpenAI
  • OpenAI rearchitected its WebRTC stack for low-latency voice AI at scale.
  • Key requirements include global reach for 900+ million weekly active users, fast connection setup, and low stable media round-trip time.
  • The architecture uses a split relay plus transceiver model to avoid one-port-per-session issues and preserve standard WebRTC behavior.
  • The relay handles UDP packet forwarding with a small public footprint, while the transceiver owns full WebRTC session state.
  • Routing is based on the ICE username fragment (ufrag) for deterministic first-packet routing without external dependencies.
  • Global Relay provides geographically distributed ingress points to reduce first-hop latency and improve user experience.
  • The implementation in Go uses efficient techniques like SO_REUSEPORT and thread pinning without kernel bypass.
  • This design enables Kubernetes scalability, simplifies security, and maintains client interoperability.