BACK_TO_FEEDAICRIER_2
Gemma 4 coding: Harness choice swings performance
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Gemma 4 coding: Harness choice swings performance

Google's Gemma 4 31B and 26B MoE models emerge as open-weight coding powerhouses, with community benchmarks revealing significant performance variance across different agentic frameworks.

// ANALYSIS

Gemma 4 marks the end of "vibe coding" benchmarks as the agentic harness becomes just as important as the model weights.

  • The 31B Dense model hits 80% on LiveCodeBench v6, rivaling proprietary frontier models in a local-first package.
  • Frameworks like Kilo Code and Roo Code extract more performance through highly structured system prompts and autonomous tool execution.
  • Variation in scores across harnesses (Claude Code vs. Kilo Code) suggests that "raw" model evals are increasingly decoupled from real-world agentic utility.
  • 26B MoE variant is the sweet spot for developers, offering 97% of Dense performance at a fraction of the inference cost.
// TAGS
gemma-4ai-codingllmopen-weightsbenchmarkgoogleagent

DISCOVERED

4h ago

2026-04-18

PUBLISHED

6h ago

2026-04-17

RELEVANCE

10/ 10

AUTHOR

jazir55