Strix Halo User Seeks Faster Local TTS
A LocalLLaMA user running Fedora 43 on a Strix Halo machine says their local stack is already excellent for LLMs and service announcements, but they want a TTS model with more expressive, human-sounding voices. Kokoro on Docker is fast, yet Qwen3-TTS feels too slow in koboldcpp at roughly 3+ seconds per sentence, and attempts to get Qwen3-TTS working through LocalAI and vLLM-rocm have not been successful on this hardware. The post asks for practical recommendations on other local-only TTS models or AMD-friendly Vulkan/ROCm setups that preserve low latency while improving voice quality and personality.
Hot take: this is a classic local-AI tradeoff post where the user has already solved throughput, and now quality plus backend compatibility are the real blockers.
- –The core constraint is Strix Halo on AMD, so any answer has to be judged on ROCm/Vulkan viability first, not just raw model quality.
- –Kokoro is the current latency baseline here; anything meaningfully slower needs to buy a clear jump in expressiveness to be worth it.
- –Qwen3-TTS is the aspirational target, but the post implies the deployment path, not the model itself, is the immediate pain point.
- –This reads more like an infrastructure discussion than a product launch, since the user is asking for working local-only setups and alternatives.
DISCOVERED
2d ago
2026-04-18
PUBLISHED
2d ago
2026-04-18
RELEVANCE
AUTHOR
dougmaitelli