OpenAI releases GPT-Realtime-2 for voice agents
OpenAI’s new realtime voice model adds GPT-5-class reasoning, 128K context, native audio and image input, and stronger tool use for production voice workflows. OpenAI says it improves over GPT-Realtime-1.5 on Big Bench Audio and instruction-following benchmarks, with a clear push toward real agentic voice use cases.
This is a meaningful step beyond “better voice” into “voice as an agent runtime.” The real story is not just the model quality, but the combination of reasoning, tool use, and lower-friction session handling.
- –Supports text, audio, and image input, with text and audio output, so it can sit in multimodal voice-agent flows
- –Built around production features like parallel tool calls, async function calls, and recovery behavior, which matter more than raw TTS polish
- –OpenAI is positioning it for support, assistance, education, and multilingual live workflows, not consumer chat novelty
- –Pricing is still premium, so the first adopters will be teams where latency and call success rates justify the spend
- –The benchmark framing suggests the market is moving from “natural-sounding speech” to “reliable spoken reasoning under tool pressure”
DISCOVERED
2h ago
2026-05-07
PUBLISHED
2h ago
2026-05-07
RELEVANCE
AUTHOR
OpenAIDevs