OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoOPENSOURCE RELEASE
OmniVoice TTS model hits ComfyUI for voice cloning
A community-built wrapper integrates k2-fsa's OmniVoice, a zero-shot TTS model supporting over 600 languages, directly into ComfyUI. The node enables high-quality voice cloning from a three-second audio seed, provided users supply an accompanying transcription of the reference audio.
// ANALYSIS
This wrapper highlights ComfyUI's rapid evolution from an image generation tool into a unified hub for complex, multimodal AI workflows.
- –The underlying OmniVoice model supports zero-shot cloning across 600+ languages from just a 3-second audio seed.
- –While the cloning process requires manual transcriptions of the seed audio, users are automating this by chaining the node with ComfyUI-Whisper.
- –With a VRAM footprint of roughly 6.5GB, the model remains highly accessible for local deployment on standard consumer hardware like an RTX 3060.
- –The wrapper abstracts away complex dependencies from the base repository, handling automatic downloads for models and transcription pipelines.
// TAGS
omnivoicecomfyuispeechaudio-genopen-source
DISCOVERED
9d ago
2026-04-02
PUBLISHED
9d ago
2026-04-02
RELEVANCE
7/ 10
AUTHOR
Altruistic_Heat_9531