MichiAI hits sub-100ms with promptable ASR
MichiAI is a 530M parameter full-duplex speech LLM that introduces natural language prompt engineering to ASR. By unifying a modified Whisper encoder with a SmolLM backbone, it enables real-time transcription primed with semantic categories and conversation history, achieving ~75ms latency for natural voice agents.
MichiAI’s "listen-head" architecture replaces the serial ASR-LLM-TTS pipeline with a unified multimodal approach that understands input audio. The system moves beyond primitive static word boosting to support semantic category priming and structural expectations via natural language. A real-time feedback loop between audio embeddings and text tokens enables context-based error correction, while a self-prompting mechanism uses conversation history to prioritize expected phonemes and achieve ~75ms latency.
DISCOVERED
3h ago
2026-04-24
PUBLISHED
4h ago
2026-04-24
RELEVANCE
AUTHOR
kwazar90