qwen3-asr-swift ships full local speech stack
qwen3-asr-swift is an open-source Swift toolkit for Apple Silicon that bundles ASR, TTS, speech-to-speech, VAD, diarization, alignment, and enhancement into one fully local stack. Its main pitch is practical on-device orchestration: large models run through MLX on GPU while lighter components use CoreML on the Neural Engine, enabling concurrent speech pipelines without cloud dependency.
This is the kind of project that makes Apple Silicon look like a serious edge-AI speech platform rather than just a good Whisper laptop.
- –The project goes beyond single-model demos by exposing 11 models behind shared Swift protocols, which makes pipeline composition a real engineering feature instead of a README promise
- –Splitting workloads between MLX and CoreML is the sharpest idea here, because it targets the actual bottleneck in local speech apps: resource contention between always-on audio tasks and larger generative models
- –The inclusion of diarization, enhancement, alignment, CLI tools, and an HTTP server makes this feel closer to a deployable speech stack than a narrow model wrapper
- –Benchmarks like sub-real-time ASR and low-latency streaming TTS matter because they make the repo useful for product builders, not just ML hobbyists
- –If the maintainer lands the roadmap items around meeting transcription, streaming diarization, and OpenAI-compatible audio APIs, this could become a strong alternative to fragmented Apple-side speech tooling
DISCOVERED
82d ago
2026-03-06
PUBLISHED
82d ago
2026-03-06
RELEVANCE
AUTHOR
ivan_digital