BACK_TO_FEEDAICRIER_2
Fish Audio S2 open-sources expressive TTS
OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 31d agoOPENSOURCE RELEASE

Fish Audio S2 open-sources expressive TTS

Fish Audio has open-sourced S2, a new text-to-speech model with natural-language emotion control, native multi-speaker generation, and a production-minded streaming stack built on SGLang. It stands out because Fish shipped a full developer package — model weights, fine-tuning code, API access, benchmarks, and self-hosting paths — instead of just a flashy demo.

// ANALYSIS

This is one of the more serious voice AI releases of the year: Fish Audio is pitching S2 as an actually deployable stack, not just a consumer voice toy. The caveat is that the release uses a research license for free use, so teams need to check commercial terms before treating it like a fully permissive open-source drop-in.

  • Natural-language inline tags like [whisper], [laugh], and custom prosody cues make S2 much more steerable than TTS systems built around fixed emotion presets
  • Fish is leaning hard on performance as a differentiator, claiming roughly 100 ms time-to-first-audio and strong streaming throughput on H200 hardware
  • The company says S2 posts leading results on Seed-TTS Eval and EmergentTTS-Eval, beating Seed-TTS, MiniMax Speech, and a GPT-4o-mini-tts baseline on multiple measures
  • Shipping the GitHub repo, Hugging Face weights, blog writeup, and hosted API together gives developers multiple adoption paths: experiment locally, self-host, or just call the service
// TAGS
fish-audio-s2speechopen-sourceinferenceapiresearch

DISCOVERED

31d ago

2026-03-11

PUBLISHED

33d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

[REDACTED]