BACK_TO_FEEDAICRIER_2
Fish Speech S2 Pro open-sources 15k inline tags
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoOPENSOURCE RELEASE

Fish Speech S2 Pro open-sources 15k inline tags

Fish Audio open-sources S2 Pro, a 4.4B parameter text-to-speech model featuring a dual-autoregressive architecture and unprecedented word-level emotional control via 15,000+ natural language inline tags.

// ANALYSIS

Fish Speech S2 Pro is a direct shot at ElevenLabs, offering production-grade latency and deep prosodic control that was previously locked behind proprietary APIs.

  • Dual-AR architecture (4B Slow AR + 400M Fast AR) balances linguistic structure with high-fidelity acoustic detail for more natural phrasing
  • Massive library of 15,000+ inline tags like [whisper] and [sigh] allows for granular emotional directing without external conditioning models
  • Optimized for sub-150ms latency on H100/H200 hardware, making it viable for real-time conversational agents and interactive gaming
  • Multilingual support for 80+ languages trained on 10M+ hours of audio puts it in the top tier of open-weights TTS models
  • Early user reports suggest a learning curve for local hardware optimization, but the underlying model quality is a significant leap for the open-source audio ecosystem
// TAGS
fish-speechttsaudio-genopen-sourceopen-weightsspeechai-audio

DISCOVERED

17d ago

2026-03-26

PUBLISHED

17d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

iKontact