OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoTUTORIAL
Pocket-TTS Server Gets Stuck in Cloning Mode
This Reddit help post describes a local Pocket-TTS setup where uploading `casual.wav` from the `alba-mackenna` voice assets pushes the app into voice-cloning mode and triggers a weight-download failure. The user is trying to use Kyutai’s built-in catalog voices instead, but the UI flow makes the cloned-voice path look like the default.
// ANALYSIS
The issue reads less like a model bug and more like a product UX trap: catalog voices and cloning inputs are being conflated, so a user can accidentally enter the wrong inference path.
- –Kyutai’s official Pocket-TTS docs position voice cloning as optional, while the catalog voices are meant to work without downloading cloning weights.
- –Uploading `casual.wav` looks like a cloning prompt, not a voice-selection action, which is why the app tries to hydrate voice-state assets.
- –The practical fix is to choose a built-in voice explicitly in the voice selector or by `voice_url`, rather than uploading a reference WAV.
- –This is a good reminder that local AI tools need clearer mode separation; when the UI hides that boundary, users will assume a missing dependency instead of a configuration mistake.
- –For developers, the real risk is support churn: a single ambiguous control can make a working open-source model feel broken.
// TAGS
speechself-hostedapikyutai-pocket-tts
DISCOVERED
9d ago
2026-04-03
PUBLISHED
9d ago
2026-04-03
RELEVANCE
8/ 10
AUTHOR
Quiet_Dasy