BACK_TO_FEEDAICRIER_2
Mistral releases Voxtral 4B TTS 2603
OPEN_SOURCE ↗
REDDIT · REDDIT// 16d agoMODEL RELEASE

Mistral releases Voxtral 4B TTS 2603

Mistral released Voxtral 4B TTS 2603, its first text-to-speech model, as an open-weights release for 9 languages and low-latency voice agents. The company says native speakers preferred it over ElevenLabs Flash v2.5 in human tests, and it can adapt from as little as 3 seconds of reference audio.

// ANALYSIS

Mistral is trying to make voice a core developer primitive, not a side demo. The real play is less about a single benchmark and more about making a production-ready voice stack easy to adopt.

  • 4B parameters plus 70 ms time-to-first-audio on a 10-second prompt makes this feel built for live conversation, not canned narration.
  • According to Mistral, native speakers preferred Voxtral over ElevenLabs Flash v2.5 in human tests, which matters more than a generic score because TTS lives or dies on perceived naturalness and accent fidelity.
  • The $0.016 per 1k characters API price is competitive, while the CC BY-NC 4.0 open-weights release gives teams a self-hosted path for experimentation and internal use.
  • vLLM Omni support and a documented single-16GB-GPU serving path lower deployment friction for smaller teams.
  • Nine languages, preset voices, and 3-second reference adaptation make it immediately relevant for support, translation, and enterprise voice agents.
// TAGS
speechaudio-genopen-weightsinferencevoxtral-4b-tts-2603

DISCOVERED

16d ago

2026-03-26

PUBLISHED

16d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

Nunki08