Gemma 4 tops Qwen 3.5 in Go benchmarks

// 93d agoBENCHMARK RESULT

Gemma 4 tops Qwen 3.5 in Go benchmarks

Miguel Filipe published a benchmarking study evaluating local MoE models like Gemma 4 and Qwen 3.5 using a functional Go integration test harness on a Framework 13 laptop. The results reveal a significant gap between theoretical performance and actual functional reliability in quantized local environments.

// ANALYSIS

Functional execution testing is the only benchmark that matters for AI coding, as it exposes the flakiness that synthetic evals hide. Gemma 4 26B-A4B emerged as the winner, proving resistant to quantization degradation, while Qwen 3.5 35B struggled with consistency and compile failures despite its larger parameter count. The study highlights that increased context length and memory bandwidth are critical for success in local MoE architectures on mobile platforms like the Ryzen AI 370HX.

// TAGS

gemma-4qwen-3-5ai-codingbenchmarklocal-llmgo

DISCOVERED

93d ago

2026-04-11

PUBLISHED

93d ago

2026-04-10

RELEVANCE

9/ 10

AUTHOR

m3thos

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Native SDK v0.5 compiles TypeScript to native

Vercel Labs has released Native SDK v0.5, introducing TypeScript support to compile applications directly to native machine code without a JavaScript engine or garbage collector. Designed with AI agents in mind, the update features 83ns update dispatch latency, supports robust TypeScript features, and allows developers to eject to Zig at any point.

UPDATE1h ago

SST Console demos AI-built settings screen

SST co-founder Dax Raad demonstrated a new settings screen for the SST Console built entirely via an interactive, Slack-integrated AI coding agent. The development involved collaborative team prompting and iterative feedback loops with the agent, resulting in a functional interface and automated walkthrough video.

UPDATE2h ago

Perplexity Computer integrates Grok 4.5

Perplexity has integrated xAI's Grok 4.5 as the orchestrator for Perplexity Computer, achieving a top score of 0.328 on its internal WANDR benchmark. The integration is highly cost-effective, running at approximately half the cost of Anthropic's Claude Opus 4.8.