YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Scenema Audio drops open-source emotional voice cloning

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Scenema Audio drops open-source emotional voice cloning
OPEN LINK ↗
// 1h agoOPENSOURCE RELEASE

Scenema Audio drops open-source emotional voice cloning

Scenema Audio is a new open-source, zero-shot voice cloning model that decouples voice identity from emotional performance. Built on Gemma 3 and an LTX diffusion transformer, it uses XML-style stage directions to make any cloned voice perform complex emotions like rage or grief alongside scene-aware background audio.

// ANALYSIS

Scenema shifts the TTS focus from mere phonetic accuracy to actual acting, solving the persistent "robotic" feel of most open-source audio generators. Decoupling identity from emotion means you can clone a flat 10-second reference clip and make that voice scream, whisper, or cry. XML-based action tags give creators fine-grained control over mid-sentence emotional shifts, pacing, and breath control. By co-generating speech and ambient environmental audio in a single pass, it drastically simplifies audio-first video generation workflows. The 16GB VRAM requirement makes this high-fidelity, performative audio accessible to developers on consumer hardware.

// TAGS
scenema-audiottsspeechaudio-genopen-weightsopen-source

DISCOVERED

1h ago

2026-05-17

PUBLISHED

2h ago

2026-05-17

RELEVANCE

8/ 10

AUTHOR

AI Search