OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
Gemma 4 coding: Harness choice swings performance
Google's Gemma 4 31B and 26B MoE models emerge as open-weight coding powerhouses, with community benchmarks revealing significant performance variance across different agentic frameworks.
// ANALYSIS
Gemma 4 marks the end of "vibe coding" benchmarks as the agentic harness becomes just as important as the model weights.
- –The 31B Dense model hits 80% on LiveCodeBench v6, rivaling proprietary frontier models in a local-first package.
- –Frameworks like Kilo Code and Roo Code extract more performance through highly structured system prompts and autonomous tool execution.
- –Variation in scores across harnesses (Claude Code vs. Kilo Code) suggests that "raw" model evals are increasingly decoupled from real-world agentic utility.
- –26B MoE variant is the sweet spot for developers, offering 97% of Dense performance at a fraction of the inference cost.
// TAGS
gemma-4ai-codingllmopen-weightsbenchmarkgoogleagent
DISCOVERED
4h ago
2026-04-18
PUBLISHED
6h ago
2026-04-17
RELEVANCE
10/ 10
AUTHOR
jazir55