OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoBENCHMARK RESULT
Qwen3-30B-A3B-Instruct-2507 Tops Qwen3.6 Judge Benchmark
A Reddit user says Qwen3-30B-A3B-Instruct-2507 outperforms newer Qwen 3.5/3.6 variants on a judge-based benchmark, with dense Gemma 4 edging it out overall. The post treats the result as a reminder that tuning style and task fit can matter more than release recency.
// ANALYSIS
This looks less like “older model is magically better” and more like a benchmark-to-model mismatch. Qwen3-30B-A3B-Instruct-2507 is the updated non-thinking instruct release, while Qwen3.6 is positioned around broader agentic utility and thinking preservation, so prompt distribution, judge bias, output style, and whether the task rewards concise non-thinking answers could all affect the result.
// TAGS
qwenqwen3qwen36instructmoebenchmarkllm-as-judgelocal-llamagemma4
DISCOVERED
2h ago
2026-04-19
PUBLISHED
3h ago
2026-04-19
RELEVANCE
8/ 10
AUTHOR
Theboyscampus