OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
Gemma 4 31B beats Qwen in Pac-Man test
In a local one-shot Pac-Man gamedev contest on a MacBook Pro M5 Max, Gemma 4 31B won by producing a shorter, clearer, and more functional game than Qwen 3.6 27B. The post argues that for local coding tasks, output quality and execution speed matter more than raw tokens per second.
// ANALYSIS
The interesting part here is not the speed number alone; it is that Gemma used far fewer tokens and still delivered the more usable result. For one-shot game generation, that is often the better tradeoff because the model has to get logic, state, and interactions right in one pass.
- –Gemma finished much faster and with far less token spend, which matters on local hardware where iteration speed is part of the product
- –Pac-Man is a decent stress test for codegen because it forces pathfinding, collision handling, game state, and rendering to work together
- –Qwen’s longer output may have been more creative, but verbosity is a liability when the goal is a playable result, not just an impressive transcript
- –For local LLM workflows, “best” is often task-specific: concise correctness beats elaborate prose for gameplay code
- –The post reinforces a broader point for AI builders: benchmark models on end-to-end utility, not just raw throughput
// TAGS
gemma-4qwenllmbenchmarkai-coding
DISCOVERED
3h ago
2026-05-01
PUBLISHED
6h ago
2026-05-01
RELEVANCE
8/ 10
AUTHOR
gladkos