YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

CosyVoice 3 setup woes, voice cloning drops words

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

CosyVoice 3 setup woes, voice cloning drops words
OPEN LINK ↗
// 65d agoOPENSOURCE RELEASE

CosyVoice 3 setup woes, voice cloning drops words

CosyVoice 3 is pitched as a zero-shot multilingual TTS model with low-latency streaming, but a Reddit user says the local install is still brittle and the generated speech can skip or reorder words. The thread captures the gap between a strong demo and a clean local voice-cloning workflow.

// ANALYSIS

CosyVoice 3 looks powerful, but it still behaves like research code wrapped in a product pitch. The hard part is less installing the model and more matching the exact variant, prompt format, and serving path.

  • The repo specifically recommends the `Fun-CosyVoice3-0.5B` checkpoint for better performance, but the install path still starts with `git clone --recursive`, a Python 3.10 conda env, and optional `ttsfrd` normalization, which is a lot for newcomers ([CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice)).
  • The Hugging Face model card uses an assistant-style prefix plus an explicit end-of-prompt token, so the published examples are more structured than a plain text-in, audio-out call ([model card](https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512)).
  • The repo issue tracker has a report of missing words and word-order drift in `inference_zero_shot`, which lines up closely with the failure mode described in the Reddit post ([issue #1302](https://github.com/FunAudioLLM/CosyVoice/issues/1302)).
  • CosyVoice 3 does offer vLLM and TensorRT-LLM deployment paths, but that reinforces the point: the speed story is a serving problem as much as a model problem ([CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice)).
// TAGS
speechaudio-genllmopen-sourceself-hostedinferencecosyvoice-3

DISCOVERED

65d ago

2026-03-24

PUBLISHED

65d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

SciData777