OpenAI explains its low-latency voice stack: relay + transceiver WebRTC architecture

OpenAI detailed how it reworked WebRTC at global scale to keep voice interactions responsive. The design splits packet routing (relay) from session termination (transceiver) to reduce public UDP surface area while preserving session ownership.

OpenAI published a technical deep dive into how it delivers **low-latency voice AI** for ChatGPT voice and the Realtime API.

## The problem: voice UX punishes latency

Voice feels natural only when turn-taking is fast and stable. At OpenAI’s scale, that translates to:

- fast connection setup

- low, stable media round-trip time (RTT)

- low jitter/packet loss across global networks

## Why WebRTC

OpenAI emphasizes WebRTC’s standardized solutions for:

- NAT traversal (ICE)

- encrypted transport (DTLS + SRTP)

- codec negotiation

- network adaptation and quality control (RTCP)

For AI, streaming audio enables real-time transcription, reasoning, and speech generation **while the user is still talking**.

## Key architectural shift: split relay + transceiver

OpenAI reports that classic “one UDP port per session” termination becomes operationally painful at high concurrency (port management, security policy surface area, and Kubernetes autoscaling friction).

Their approach separates responsibilities:

- **Relay:** a lightweight UDP forwarder with a small public footprint

- **Transceiver:** the stateful owner of each WebRTC session (ICE/DTLS/SRTP/session lifecycle)

A crucial trick is routing first packets using an existing WebRTC-native identifier: the ICE **ufrag** embedded in STUN checks.

## Takeaways for builders

If you are building real-time voice or agentic systems:

- budget for networking architecture early (especially first-hop latency)

- treat session ownership as a core scaling constraint

- minimize externally exposed UDP ranges where possible

This write-up is especially relevant for teams building on WebRTC for client-to-server AI interactions.