OPEN_SOURCE ↗
REDDIT · REDDIT// 5d agoBENCHMARK RESULT
Gemma 4 E4B vision falls short of Qwen
Reddit's LocalLLaMA community is reporting that Google's new "Effective" 4B model significantly underperforms in visual reasoning tasks compared to competitors like Qwen 3.5-4B. Despite strong official benchmarks, real-world tests show a major gap in OCR and spatial inference, raising questions about the "Effective" parameter architecture's multimodal alignment for edge devices.
// ANALYSIS
Gemma 4's "Effective" architecture may be hitting a multimodal bottleneck where its 4.5B active parameters can't match the visual reasoning depth of its 8B-equivalent text performance.
- –User benchmarks show Gemma 4 E4B scoring nearly 50% lower than Qwen 3.5-4B on complex vision test suites.
- –Initial llama.cpp support (build 8680) appears unstable, with users reporting failures to return answers even with recommended token settings.
- –The model's Per-Layer Embeddings (PLE) trick seems to prioritize text coherence over robust image-text alignment.
- –Local developers are already pivoting back to Qwen or stepping up to the 26B Gemma 4 variant for reliable production vision.
- –This highlights a growing "benchmark-vs-reality" gap for edge-optimized multimodal models.
// TAGS
gemma-4-e4bllmmultimodalbenchmarkopen-weightsgoogle
DISCOVERED
5d ago
2026-04-07
PUBLISHED
5d ago
2026-04-06
RELEVANCE
8/ 10
AUTHOR
specji