OPEN_SOURCE ↗
X · X// 3h agoMODEL RELEASE
Inworld Realtime TTS-2 adds voice direction
Inworld’s research-preview voice model is built for realtime conversation, conditioning on prior audio and plain-English voice direction instead of treating speech as isolated text-to-speech. It also keeps one speaker identity across 100+ languages and ships through the Inworld API and Realtime API.
// ANALYSIS
This is the right direction for voice AI: less “pretty TTS,” more controllable conversational state. The real shift is from static narration to a model that adapts to how the user actually sounds and how the developer wants the line delivered.
- –Prior audio context lets the model carry tone, pacing, and emotional state across turns instead of only reading the current sentence
- –Plain-English voice direction lowers the control burden; devs can steer delivery without wrestling with fixed emotion enums
- –Crosslingual identity preservation matters for support, games, and companion apps that need one persona across markets
- –The research-preview rollout makes this feel like a production API move, not just a demo tied to one flagship voice
- –The main proof point now is reliability: latency, pronunciation edge cases, and expressive consistency in real apps
// TAGS
inworld-tts-2ttsspeechvoice-agentagentapiinfrastructure
DISCOVERED
3h ago
2026-05-06
PUBLISHED
3h ago
2026-05-06
RELEVANCE
8/ 10
AUTHOR
inworld_ai