BACK_TO_FEEDAICRIER_2
little-coder lifts Qwen into top 10
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

little-coder lifts Qwen into top 10

little-coder paired with Qwen3.6-35B-A3B scored 78.67% on the full 225-task Aider Polyglot benchmark, up from roughly 45.6% with Qwen3.5 9B in the same scaffold. The run was offline on an 8 GB laptop GPU using llama.cpp, strengthening the case that agent harness design can matter as much as model size for local coding performance.

// ANALYSIS

This is less a clean model victory than a scaffold warning shot: local models may look weaker partly because they are tested inside agents tuned for frontier-model behavior.

  • The result puts a 35B-total, 3B-active MoE model in the public top-10 band on Aider Polyglot, which is unusually strong for an offline local setup.
  • The biggest gain came from first-attempt solves, suggesting Qwen3.6-35B-A3B is doing more than benefiting from retry mechanics.
  • little-coder’s small-model guardrails, including tool-use constraints, workspace discovery, and reasoning-budget control, make the harness part of the benchmark result.
  • The methodology is still self-reported and benchmark-specific, so Terminal Bench and GAIA follow-ups will matter before generalizing the claim.
  • For developers running local agents, this points toward optimizing scaffolds, prompts, and tool loops before assuming only larger cloud models can compete.
// TAGS
little-coderqwen3.6-35b-a3bai-codingagentllmbenchmarkself-hostedopen-source

DISCOVERED

4h ago

2026-04-22

PUBLISHED

5h ago

2026-04-22

RELEVANCE

9/ 10

AUTHOR

Creative-Regular6799