Qwen3.5-122B-A10B thinking mode trails GPT-OSS
On a roughly 17k-character tagging prompt, a Reddit user reports GPT-OSS-120B finishing in about 25 seconds while Qwen3.5-122B-A10B took more than four minutes. The likely culprit is Qwen3.5’s default thinking mode, which can add a long reasoning pass before the final answer.
This smells like a reasoning-budget problem, not a bandwidth problem: Qwen3.5’s default thinking pass can dominate wall time on easy extraction jobs.
- –Qwen3.5-122B-A10B is a 122B-parameter MoE with 10B activated, so raw tok/s can look fine even when extra reasoning tokens inflate end-to-end latency.
- –The model card explicitly says thinking is on by default and documents a non-thinking configuration, which is the lever you want for low-latency pipelines.
- –For tagging and other narrow extraction tasks, benchmark thinking and non-thinking modes separately; otherwise you’re measuring reasoning budget, not just model quality.
- –If latency is the blocker, GPT-OSS-120B is probably the safer fast-path default, while Qwen3.5 stays the better choice for harder prompts.
DISCOVERED
65d ago
2026-03-24
PUBLISHED
65d ago
2026-03-23
RELEVANCE
AUTHOR
florinandrei