OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoOPENSOURCE RELEASE
qwen3-asr-swift ships full local speech stack
qwen3-asr-swift is an open-source Swift toolkit for Apple Silicon that bundles ASR, TTS, speech-to-speech, VAD, diarization, alignment, and enhancement into one fully local stack. Its main pitch is practical on-device orchestration: large models run through MLX on GPU while lighter components use CoreML on the Neural Engine, enabling concurrent speech pipelines without cloud dependency.
// ANALYSIS
This is the kind of project that makes Apple Silicon look like a serious edge-AI speech platform rather than just a good Whisper laptop.
- –The project goes beyond single-model demos by exposing 11 models behind shared Swift protocols, which makes pipeline composition a real engineering feature instead of a README promise
- –Splitting workloads between MLX and CoreML is the sharpest idea here, because it targets the actual bottleneck in local speech apps: resource contention between always-on audio tasks and larger generative models
- –The inclusion of diarization, enhancement, alignment, CLI tools, and an HTTP server makes this feel closer to a deployable speech stack than a narrow model wrapper
- –Benchmarks like sub-real-time ASR and low-latency streaming TTS matter because they make the repo useful for product builders, not just ML hobbyists
- –If the maintainer lands the roadmap items around meeting transcription, streaming diarization, and OpenAI-compatible audio APIs, this could become a strong alternative to fragmented Apple-side speech tooling
// TAGS
qwen3-asr-swiftspeechopen-sourcedevtoolinferenceapi
DISCOVERED
37d ago
2026-03-06
PUBLISHED
37d ago
2026-03-06
RELEVANCE
8/ 10
AUTHOR
ivan_digital