OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
little-coder lifts Qwen into top 10
little-coder paired with Qwen3.6-35B-A3B scored 78.67% on the full 225-task Aider Polyglot benchmark, up from roughly 45.6% with Qwen3.5 9B in the same scaffold. The run was offline on an 8 GB laptop GPU using llama.cpp, strengthening the case that agent harness design can matter as much as model size for local coding performance.
// ANALYSIS
This is less a clean model victory than a scaffold warning shot: local models may look weaker partly because they are tested inside agents tuned for frontier-model behavior.
- –The result puts a 35B-total, 3B-active MoE model in the public top-10 band on Aider Polyglot, which is unusually strong for an offline local setup.
- –The biggest gain came from first-attempt solves, suggesting Qwen3.6-35B-A3B is doing more than benefiting from retry mechanics.
- –little-coder’s small-model guardrails, including tool-use constraints, workspace discovery, and reasoning-budget control, make the harness part of the benchmark result.
- –The methodology is still self-reported and benchmark-specific, so Terminal Bench and GAIA follow-ups will matter before generalizing the claim.
- –For developers running local agents, this points toward optimizing scaffolds, prompts, and tool loops before assuming only larger cloud models can compete.
// TAGS
little-coderqwen3.6-35b-a3bai-codingagentllmbenchmarkself-hostedopen-source
DISCOVERED
4h ago
2026-04-22
PUBLISHED
5h ago
2026-04-22
RELEVANCE
9/ 10
AUTHOR
Creative-Regular6799