OpenAI adds realtime voice models
OpenAI released GPT-Realtime-2 and GPT-Realtime-Whisper in its Realtime API, pairing stronger speech-to-speech reasoning with low-latency transcription. The update targets production voice apps that need tool use, context, and live conversation handling without falling back to text-first flows.
OpenAI is pushing voice past “transcribe then respond” into a more agentic layer where the model can reason, recover, and act while people keep talking. For builders, that shifts the bottleneck from raw latency to real-world reliability across noisy, multilingual, interruption-heavy conversations.
- –GPT-Realtime-2 adds GPT-5-class reasoning, 128K context, and better tone control for live voice agents
- –GPT-Realtime-Whisper gives developers streaming speech-to-text for captions, notes, and continuous understanding
- –The India angle matters: better transcription and translation should help multilingual voice products handle regional accents and phonetics more cleanly
- –OpenAI is also signaling production readiness with tool transparency, recovery behavior, and tighter safety guardrails
- –This strengthens the case for voice as a primary app interface, not just a front-end to text chat
DISCOVERED
1h ago
2026-05-08
PUBLISHED
1h ago
2026-05-08
RELEVANCE
AUTHOR
OpenAIDevs