REDDIT · REDDIT// 3h agoOPENSOURCE RELEASE

MichiAI hits sub-100ms with promptable ASR

MichiAI is a 530M parameter full-duplex speech LLM that introduces natural language prompt engineering to ASR. By unifying a modified Whisper encoder with a SmolLM backbone, it enables real-time transcription primed with semantic categories and conversation history, achieving ~75ms latency for natural voice agents.

// ANALYSIS

MichiAI’s "listen-head" architecture replaces the serial ASR-LLM-TTS pipeline with a unified multimodal approach that understands input audio. The system moves beyond primitive static word boosting to support semantic category priming and structural expectations via natural language. A real-time feedback loop between audio embeddings and text tokens enables context-based error correction, while a self-prompting mechanism uses conversation history to prioritize expected phonemes and achieve ~75ms latency.

// TAGS

michiaispeechllmprompt-engineeringaudio-genopen-source

DISCOVERED

3h ago

2026-04-24

PUBLISHED

4h ago

2026-04-24

RELEVANCE

8/ 10

AUTHOR

kwazar90