OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoOPENSOURCE RELEASE
Fish Speech S2 Pro open-sources 15k inline tags
Fish Audio open-sources S2 Pro, a 4.4B parameter text-to-speech model featuring a dual-autoregressive architecture and unprecedented word-level emotional control via 15,000+ natural language inline tags.
// ANALYSIS
Fish Speech S2 Pro is a direct shot at ElevenLabs, offering production-grade latency and deep prosodic control that was previously locked behind proprietary APIs.
- –Dual-AR architecture (4B Slow AR + 400M Fast AR) balances linguistic structure with high-fidelity acoustic detail for more natural phrasing
- –Massive library of 15,000+ inline tags like [whisper] and [sigh] allows for granular emotional directing without external conditioning models
- –Optimized for sub-150ms latency on H100/H200 hardware, making it viable for real-time conversational agents and interactive gaming
- –Multilingual support for 80+ languages trained on 10M+ hours of audio puts it in the top tier of open-weights TTS models
- –Early user reports suggest a learning curve for local hardware optimization, but the underlying model quality is a significant leap for the open-source audio ecosystem
// TAGS
fish-speechttsaudio-genopen-sourceopen-weightsspeechai-audio
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
9/ 10
AUTHOR
iKontact