Grok TTS tops voice humanness benchmark
Vapi’s blind-voted Humanness Index currently ranks xAI’s Grok TTS as its most human-sounding voice model, with Grok’s streaming variant also near the top. The benchmark compares cloned voices across models so listeners judge model quality, not polished vendor demos.
Grok TTS looks like a serious voice-agent contender, but the win matters most if xAI can pair naturalness with low-latency, stable developer APIs.
- –Blind voting is a useful counterweight to cherry-picked demo reels, especially in TTS where tiny artifacts break trust.
- –Grok TTS leads on perceived humanness, while its streaming version trades some score for faster response time.
- –Vapi’s index puts xAI directly against ElevenLabs, MiniMax, Canopy, and Inworld in a way voice-agent builders can actually compare.
- –For production agents, the next question is not just realism; it is latency, cost, language coverage, and reliability under phone-quality audio.
DISCOVERED
1h ago
2026-06-18
PUBLISHED
2h ago
2026-06-18
RELEVANCE
AUTHOR
elonmusk