BACK_TO_FEEDAICRIER_2
OmniVoice TTS model hits ComfyUI for voice cloning
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoOPENSOURCE RELEASE

OmniVoice TTS model hits ComfyUI for voice cloning

A community-built wrapper integrates k2-fsa's OmniVoice, a zero-shot TTS model supporting over 600 languages, directly into ComfyUI. The node enables high-quality voice cloning from a three-second audio seed, provided users supply an accompanying transcription of the reference audio.

// ANALYSIS

This wrapper highlights ComfyUI's rapid evolution from an image generation tool into a unified hub for complex, multimodal AI workflows.

  • The underlying OmniVoice model supports zero-shot cloning across 600+ languages from just a 3-second audio seed.
  • While the cloning process requires manual transcriptions of the seed audio, users are automating this by chaining the node with ComfyUI-Whisper.
  • With a VRAM footprint of roughly 6.5GB, the model remains highly accessible for local deployment on standard consumer hardware like an RTX 3060.
  • The wrapper abstracts away complex dependencies from the base repository, handling automatic downloads for models and transcription pipelines.
// TAGS
omnivoicecomfyuispeechaudio-genopen-source

DISCOVERED

9d ago

2026-04-02

PUBLISHED

9d ago

2026-04-02

RELEVANCE

7/ 10

AUTHOR

Altruistic_Heat_9531