BACK_TO_FEEDAICRIER_2
Voxtral Voice Clone fills encoder gap
OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoOPENSOURCE RELEASE

Voxtral Voice Clone fills encoder gap

Voxtral Voice Clone is an open-source project that reconstructs the missing codec encoder for Mistral's Voxtral-4B-TTS model, which otherwise only supports 20 preset voices. The repo adds training, LoRA distillation, and weight-injection tooling so Voxtral can accept ref_audio for zero-shot voice cloning.

// ANALYSIS

Community reverse-engineering like this is what turns an open-weight demo into something builders can actually use. The catch is that this is still research-heavy plumbing, not a turnkey voice-cloning product.

  • The repo targets the exact blocker in Voxtral's TTS stack: no codec encoder weights means no custom speaker embeddings from reference audio.
  • It retrains the encoder and LoRA-distills the LLM, which is the right shape if you want embeddings that still land in Voxtral's expected voice space.
  • The 80GB GPU requirement and two-phase training pipeline make this a serious research project, not a casual local install.
  • The repo is CC BY-NC 4.0, and the derivative weights still inherit Mistral's license constraints, so commercial teams need to read the fine print.
  • Sources: [Reddit](https://www.reddit.com/r/LocalLLaMA/comments/1s6rmoi/the_missing_piece_of_voxtral_tts_to_enable_voice/), [GitHub](https://github.com/Al0olo/voxtral-voice-clone)
// TAGS
voxtral-voice-clonespeechaudio-genopen-sourcefine-tuningself-hostedllm

DISCOVERED

13d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

al0olo