Qwen3-30B-A3B-Instruct-2507 Tops Qwen3.6 Judge Benchmark

// 90d agoBENCHMARK RESULT

Qwen3-30B-A3B-Instruct-2507 Tops Qwen3.6 Judge Benchmark

A Reddit user says Qwen3-30B-A3B-Instruct-2507 outperforms newer Qwen 3.5/3.6 variants on a judge-based benchmark, with dense Gemma 4 edging it out overall. The post treats the result as a reminder that tuning style and task fit can matter more than release recency.

// ANALYSIS

This looks less like “older model is magically better” and more like a benchmark-to-model mismatch. Qwen3-30B-A3B-Instruct-2507 is the updated non-thinking instruct release, while Qwen3.6 is positioned around broader agentic utility and thinking preservation, so prompt distribution, judge bias, output style, and whether the task rewards concise non-thinking answers could all affect the result.

// TAGS

qwenqwen3qwen36instructmoebenchmarkllm-as-judgelocal-llamagemma4

DISCOVERED

90d ago

2026-04-19

PUBLISHED

90d ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

Theboyscampus

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL18m ago

OpenRouter adds nine new AI models

Unified API provider OpenRouter has added nine major new AI models to its platform, highlighted by Moonshot AI's Kimi K3, Meta AI's Muse Spark 1.1, and Thinking Machines Lab's Inkling. The additions provide developers with immediate API access to these frontier systems for tasks ranging from long-horizon coding and tool use to multimodal reasoning.

UPDATE1h ago

Tesana automates character weapon rigging

Tesana AI has rolled out an engine upgrade that automates character weapon and item attachments, bypassing the tedious manual rigging process. By automatically handling grip points and alignment, the engine allows developers to speed up asset importing and focus on core game design.

BENCHMARK1h ago

GLM-5.2 matches closed models on cyber tasks

The UK AI Security Institute (AISI) has released evaluation results from testing leading open-weight AI models against closed frontier systems on practical cyber work, such as vulnerability research, reverse engineering, exploitation, and multi-step network attacks. The benchmark results indicate that the performance gap between open-weight and closed-weight models is shrinking rapidly, with Z.ai's open-weight GLM-5.2 matching the cyber capabilities of closed frontier models released just four to seven months prior.