OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoOPENSOURCE RELEASE
Voxtral Voice Clone fills encoder gap
Voxtral Voice Clone is an open-source project that reconstructs the missing codec encoder for Mistral's Voxtral-4B-TTS model, which otherwise only supports 20 preset voices. The repo adds training, LoRA distillation, and weight-injection tooling so Voxtral can accept ref_audio for zero-shot voice cloning.
// ANALYSIS
Community reverse-engineering like this is what turns an open-weight demo into something builders can actually use. The catch is that this is still research-heavy plumbing, not a turnkey voice-cloning product.
- –The repo targets the exact blocker in Voxtral's TTS stack: no codec encoder weights means no custom speaker embeddings from reference audio.
- –It retrains the encoder and LoRA-distills the LLM, which is the right shape if you want embeddings that still land in Voxtral's expected voice space.
- –The 80GB GPU requirement and two-phase training pipeline make this a serious research project, not a casual local install.
- –The repo is CC BY-NC 4.0, and the derivative weights still inherit Mistral's license constraints, so commercial teams need to read the fine print.
- –Sources: [Reddit](https://www.reddit.com/r/LocalLLaMA/comments/1s6rmoi/the_missing_piece_of_voxtral_tts_to_enable_voice/), [GitHub](https://github.com/Al0olo/voxtral-voice-clone)
// TAGS
voxtral-voice-clonespeechaudio-genopen-sourcefine-tuningself-hostedllm
DISCOVERED
13d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
al0olo