BACK_TO_FEEDAICRIER_2
Gemma 4 E4B vision falls short of Qwen
OPEN_SOURCE ↗
REDDIT · REDDIT// 5d agoBENCHMARK RESULT

Gemma 4 E4B vision falls short of Qwen

Reddit's LocalLLaMA community is reporting that Google's new "Effective" 4B model significantly underperforms in visual reasoning tasks compared to competitors like Qwen 3.5-4B. Despite strong official benchmarks, real-world tests show a major gap in OCR and spatial inference, raising questions about the "Effective" parameter architecture's multimodal alignment for edge devices.

// ANALYSIS

Gemma 4's "Effective" architecture may be hitting a multimodal bottleneck where its 4.5B active parameters can't match the visual reasoning depth of its 8B-equivalent text performance.

  • User benchmarks show Gemma 4 E4B scoring nearly 50% lower than Qwen 3.5-4B on complex vision test suites.
  • Initial llama.cpp support (build 8680) appears unstable, with users reporting failures to return answers even with recommended token settings.
  • The model's Per-Layer Embeddings (PLE) trick seems to prioritize text coherence over robust image-text alignment.
  • Local developers are already pivoting back to Qwen or stepping up to the 26B Gemma 4 variant for reliable production vision.
  • This highlights a growing "benchmark-vs-reality" gap for edge-optimized multimodal models.
// TAGS
gemma-4-e4bllmmultimodalbenchmarkopen-weightsgoogle

DISCOVERED

5d ago

2026-04-07

PUBLISHED

5d ago

2026-04-06

RELEVANCE

8/ 10

AUTHOR

specji