OpenAI ships GPT-Realtime-2, voice reasoning
OpenAI’s GPT-Realtime-2 is a speech-to-speech voice model with GPT-5-class reasoning, now available in the Realtime API. The launch also adds realtime translation and transcription models, pointing to a broader production voice stack for developers.
This is OpenAI trying to make voice agents feel like native model products, not stitched-together STT plus TTS pipelines. The interesting part is less the audio novelty and more the jump in reasoning, tool use, and session memory that makes spoken workflows usable.
- –GPT-Realtime-2 adds GPT-5-class reasoning, stronger instruction following, and more reliable tool use for complex voice-agent flows
- –The model ships with a 128,000-token context window, which matters for long, interruption-heavy conversations
- –OpenAI is pairing it with realtime translation and transcription models, so developers can build full voice stacks without mixing vendors
- –Pricing is aimed at production workloads, not toys: $32 per 1M audio input tokens and $64 per 1M audio output tokens
- –The real test will be latency and turn-taking under messy, real-world speech, not benchmark hype
DISCOVERED
1d ago
2026-05-08
PUBLISHED
1d ago
2026-05-07
RELEVANCE
AUTHOR
OpenAIDevs