REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Gemma 4 31B beats Qwen in Pac-Man test

In a local one-shot Pac-Man gamedev contest on a MacBook Pro M5 Max, Gemma 4 31B won by producing a shorter, clearer, and more functional game than Qwen 3.6 27B. The post argues that for local coding tasks, output quality and execution speed matter more than raw tokens per second.

// ANALYSIS

The interesting part here is not the speed number alone; it is that Gemma used far fewer tokens and still delivered the more usable result. For one-shot game generation, that is often the better tradeoff because the model has to get logic, state, and interactions right in one pass.

–Gemma finished much faster and with far less token spend, which matters on local hardware where iteration speed is part of the product
–Pac-Man is a decent stress test for codegen because it forces pathfinding, collision handling, game state, and rendering to work together
–Qwen’s longer output may have been more creative, but verbosity is a liability when the goal is a playable result, not just an impressive transcript
–For local LLM workflows, “best” is often task-specific: concise correctness beats elaborate prose for gameplay code
–The post reinforces a broader point for AI builders: benchmark models on end-to-end utility, not just raw throughput

// TAGS

gemma-4qwenllmbenchmarkai-coding

DISCOVERED

3h ago

2026-05-01

PUBLISHED

6h ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

gladkos