OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoPRODUCT LAUNCH
Gemma 4-E2B STT hits Home Assistant hurdles
Google's new 2B parameter multimodal model, Gemma 4-E2B, is being repurposed for local Speech-to-Text (STT) in Home Assistant. While its accuracy is impressive, it natively outputs its internal "thought chain," requiring community-developed middleware to strip reasoning tags for raw transcriptions.
// ANALYSIS
Gemma 4's multimodal capabilities make it a high-performance local STT contender, but its "thoughtful" default behavior is currently a friction point for simple transcription tasks.
- –Native audio support in a 2-billion parameter model allows for low-latency, high-accuracy STT on consumer GPUs, rivaling dedicated models like Parakeet.
- –The model’s built-in reasoning engine, while valuable for complex prompts, lacks a reliable server-side "off" switch in current llama.cpp and llama-swap implementations.
- –Community members are bypassing the problem with custom FastAPI middleware that regex-strips <|channel>thought tags before the data reaches Home Assistant.
- –This integration highlights the growing trend of using general-purpose multimodal LLMs as high-performance drop-in replacements for traditional specialized audio encoders.
- –The combination of llama-swap and wyoming_openai remains the dominant architecture for bridging local LLM servers to the Home Assistant "Assist" pipeline.
// TAGS
gemma-4-e2bgemma-4llmspeechself-hostedhome-assistantstt
DISCOVERED
4h ago
2026-04-18
PUBLISHED
7h ago
2026-04-17
RELEVANCE
8/ 10
AUTHOR
andy2na