BACK_TO_FEEDAICRIER_2
Inworld Realtime TTS-2 adds voice direction
OPEN_SOURCE ↗
X · X// 3h agoMODEL RELEASE

Inworld Realtime TTS-2 adds voice direction

Inworld’s research-preview voice model is built for realtime conversation, conditioning on prior audio and plain-English voice direction instead of treating speech as isolated text-to-speech. It also keeps one speaker identity across 100+ languages and ships through the Inworld API and Realtime API.

// ANALYSIS

This is the right direction for voice AI: less “pretty TTS,” more controllable conversational state. The real shift is from static narration to a model that adapts to how the user actually sounds and how the developer wants the line delivered.

  • Prior audio context lets the model carry tone, pacing, and emotional state across turns instead of only reading the current sentence
  • Plain-English voice direction lowers the control burden; devs can steer delivery without wrestling with fixed emotion enums
  • Crosslingual identity preservation matters for support, games, and companion apps that need one persona across markets
  • The research-preview rollout makes this feel like a production API move, not just a demo tied to one flagship voice
  • The main proof point now is reliability: latency, pronunciation edge cases, and expressive consistency in real apps
// TAGS
inworld-tts-2ttsspeechvoice-agentagentapiinfrastructure

DISCOVERED

3h ago

2026-05-06

PUBLISHED

3h ago

2026-05-06

RELEVANCE

8/ 10

AUTHOR

inworld_ai