little-coder lifts Qwen into top 10
little-coder paired with Qwen3.6-35B-A3B scored 78.67% on the full 225-task Aider Polyglot benchmark, up from roughly 45.6% with Qwen3.5 9B in the same scaffold. The run was offline on an 8 GB laptop GPU using llama.cpp, strengthening the case that agent harness design can matter as much as model size for local coding performance.
This is less a clean model victory than a scaffold warning shot: local models may look weaker partly because they are tested inside agents tuned for frontier-model behavior.
- –The result puts a 35B-total, 3B-active MoE model in the public top-10 band on Aider Polyglot, which is unusually strong for an offline local setup.
- –The biggest gain came from first-attempt solves, suggesting Qwen3.6-35B-A3B is doing more than benefiting from retry mechanics.
- –little-coder’s small-model guardrails, including tool-use constraints, workspace discovery, and reasoning-budget control, make the harness part of the benchmark result.
- –The methodology is still self-reported and benchmark-specific, so Terminal Bench and GAIA follow-ups will matter before generalizing the claim.
- –For developers running local agents, this points toward optimizing scaffolds, prompts, and tool loops before assuming only larger cloud models can compete.
DISCOVERED
45d ago
2026-04-22
PUBLISHED
45d ago
2026-04-22
RELEVANCE
AUTHOR
Creative-Regular6799