Qwen3.6-35B-A3B tops local agent benchmarks
This Reddit benchmark claims Qwen3.6-35B-A3B is the fastest model tested on an M3 Ultra while also posting near-universal tool-calling compatibility across Hermes Agent, PydanticAI, LangChain, smolagents, and OpenClaude. The post positions it as the strongest local agent backend in the Qwen family, especially for Apple Silicon users.
If these results hold up, Qwen3.6-35B-A3B is more interesting as an agent runtime than as a pure benchmark champ: speed plus broad framework compatibility is exactly what local builders need.
- –Hermes looks like the real stress test here; passing the hardest harness matters more than scoring well on forgiving frameworks.
- –smolagents appears to flatter weaker models because it leans on code generation instead of strict structured tool calling, so its 100% numbers are not directly comparable.
- –The low HumanEval/MMLU numbers on Qwen3.6 likely reflect day-0 quantization or evaluation noise more than the model’s true ceiling.
- –Qwen3.5-35B at 8-bit reads like the safer all-rounder if you can spare RAM; Qwen3.6 is the speed pick.
- –The broader takeaway is that local agent stacks are now mostly a framework compatibility problem, not just a model-quality problem.
DISCOVERED
45d ago
2026-04-18
PUBLISHED
45d ago
2026-04-18
RELEVANCE
AUTHOR
Striking-Swim6702