OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoBENCHMARK RESULT
Qwopus tops local flight sim benchmark
A community benchmark tested nine 8-bit MLX local models on the same single-file browser flight-combat game prompt. Qwopus 3.5 27B ranked first, while the results suggested quant provider, task completion, and emergent implementation choices mattered more than raw parameter count.
// ANALYSIS
This is not a rigorous leaderboard, but it is a useful artifact test because it measures whether models can ship something playable, not just pass canned coding evals.
- –Qwopus winning with fewer prompts and real flight physics is a strong signal for distillation quality on messy creative coding tasks.
- –The three Qwen3.6 35B quant variants behaving differently makes quantization source a practical evaluation dimension, not just a packaging detail.
- –Qwen Coder Next 80B producing the most code but one of the weakest games is a reminder that bigger code output can hide worse product judgment.
- –The wildcard plane choice worked as a lightweight creativity probe, exposing differences that standard benchmark prompts usually miss.
// TAGS
flight-combat-llm-comparisonllmbenchmarkai-codingself-hostedopen-weightsinference
DISCOVERED
5h ago
2026-04-21
PUBLISHED
7h ago
2026-04-21
RELEVANCE
8/ 10
AUTHOR
StudentDifficult8240