OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 3h agoMODEL RELEASE
Inworld Realtime TTS-2 adds voice direction
Inworld’s Realtime TTS-2 pushes voice synthesis beyond clean playback into controllable performance, with natural-language direction for tone, emotion, speed, and pitch. It also adds text-based voice design, cross-lingual synthesis across 100+ languages, and tighter pronunciation control for production use.
// ANALYSIS
This looks less like a cosmetic TTS refresh and more like a real step toward promptable voice infrastructure for agents. The pitch is simple: if voice is going to be an interface, developers need something they can steer as precisely as an LLM.
- –Natural-language direction lowers the friction between product intent and audio output, which matters more than raw fidelity in most agent workflows
- –Text-based voice design is useful for teams that want to prototype brand voices without recording sessions or casting
- –Cross-lingual synthesis with identity preservation is the strongest differentiator for globally deployed support, companion, and tutor apps
- –IPA control and better alphanumeric pronunciation target the trust breakers that still make voice products feel brittle in production
- –This is a strong fit for AI agent stacks, but it is still a specialized TTS product rather than a broad platform move
// TAGS
ttsspeechvoice-agentstreamingapirealtime-tts-2
DISCOVERED
3h ago
2026-05-06
PUBLISHED
8h ago
2026-05-06
RELEVANCE
9/ 10
AUTHOR
[REDACTED]