OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoBENCHMARK RESULT
Qwen3.5-122B-A10B thinking mode trails GPT-OSS
On a roughly 17k-character tagging prompt, a Reddit user reports GPT-OSS-120B finishing in about 25 seconds while Qwen3.5-122B-A10B took more than four minutes. The likely culprit is Qwen3.5’s default thinking mode, which can add a long reasoning pass before the final answer.
// ANALYSIS
This smells like a reasoning-budget problem, not a bandwidth problem: Qwen3.5’s default thinking pass can dominate wall time on easy extraction jobs.
- –Qwen3.5-122B-A10B is a 122B-parameter MoE with 10B activated, so raw tok/s can look fine even when extra reasoning tokens inflate end-to-end latency.
- –The model card explicitly says thinking is on by default and documents a non-thinking configuration, which is the lever you want for low-latency pipelines.
- –For tagging and other narrow extraction tasks, benchmark thinking and non-thinking modes separately; otherwise you’re measuring reasoning budget, not just model quality.
- –If latency is the blocker, GPT-OSS-120B is probably the safer fast-path default, while Qwen3.5 stays the better choice for harder prompts.
// TAGS
qwen3.5-122b-a10bgpt-oss-120bllmreasoninginferencebenchmarkopen-weights
DISCOVERED
19d ago
2026-03-24
PUBLISHED
19d ago
2026-03-23
RELEVANCE
7/ 10
AUTHOR
florinandrei