BACK_TO_FEEDAICRIER_2
Pocket-TTS Server Gets Stuck in Cloning Mode
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoTUTORIAL

Pocket-TTS Server Gets Stuck in Cloning Mode

This Reddit help post describes a local Pocket-TTS setup where uploading `casual.wav` from the `alba-mackenna` voice assets pushes the app into voice-cloning mode and triggers a weight-download failure. The user is trying to use Kyutai’s built-in catalog voices instead, but the UI flow makes the cloned-voice path look like the default.

// ANALYSIS

The issue reads less like a model bug and more like a product UX trap: catalog voices and cloning inputs are being conflated, so a user can accidentally enter the wrong inference path.

  • Kyutai’s official Pocket-TTS docs position voice cloning as optional, while the catalog voices are meant to work without downloading cloning weights.
  • Uploading `casual.wav` looks like a cloning prompt, not a voice-selection action, which is why the app tries to hydrate voice-state assets.
  • The practical fix is to choose a built-in voice explicitly in the voice selector or by `voice_url`, rather than uploading a reference WAV.
  • This is a good reminder that local AI tools need clearer mode separation; when the UI hides that boundary, users will assume a missing dependency instead of a configuration mistake.
  • For developers, the real risk is support churn: a single ambiguous control can make a working open-source model feel broken.
// TAGS
speechself-hostedapikyutai-pocket-tts

DISCOVERED

9d ago

2026-04-03

PUBLISHED

9d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

Quiet_Dasy