BACK_TO_FEEDAICRIER_2
Gemma 4-E2B STT hits Home Assistant hurdles
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoPRODUCT LAUNCH

Gemma 4-E2B STT hits Home Assistant hurdles

Google's new 2B parameter multimodal model, Gemma 4-E2B, is being repurposed for local Speech-to-Text (STT) in Home Assistant. While its accuracy is impressive, it natively outputs its internal "thought chain," requiring community-developed middleware to strip reasoning tags for raw transcriptions.

// ANALYSIS

Gemma 4's multimodal capabilities make it a high-performance local STT contender, but its "thoughtful" default behavior is currently a friction point for simple transcription tasks.

  • Native audio support in a 2-billion parameter model allows for low-latency, high-accuracy STT on consumer GPUs, rivaling dedicated models like Parakeet.
  • The model’s built-in reasoning engine, while valuable for complex prompts, lacks a reliable server-side "off" switch in current llama.cpp and llama-swap implementations.
  • Community members are bypassing the problem with custom FastAPI middleware that regex-strips <|channel>thought tags before the data reaches Home Assistant.
  • This integration highlights the growing trend of using general-purpose multimodal LLMs as high-performance drop-in replacements for traditional specialized audio encoders.
  • The combination of llama-swap and wyoming_openai remains the dominant architecture for bridging local LLM servers to the Home Assistant "Assist" pipeline.
// TAGS
gemma-4-e2bgemma-4llmspeechself-hostedhome-assistantstt

DISCOVERED

4h ago

2026-04-18

PUBLISHED

7h ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

andy2na