Qwen3.5-122B-A10B thinking mode trails GPT-OSS

// 111d agoBENCHMARK RESULT

Qwen3.5-122B-A10B thinking mode trails GPT-OSS

On a roughly 17k-character tagging prompt, a Reddit user reports GPT-OSS-120B finishing in about 25 seconds while Qwen3.5-122B-A10B took more than four minutes. The likely culprit is Qwen3.5’s default thinking mode, which can add a long reasoning pass before the final answer.

// ANALYSIS

This smells like a reasoning-budget problem, not a bandwidth problem: Qwen3.5’s default thinking pass can dominate wall time on easy extraction jobs.

–Qwen3.5-122B-A10B is a 122B-parameter MoE with 10B activated, so raw tok/s can look fine even when extra reasoning tokens inflate end-to-end latency.
–The model card explicitly says thinking is on by default and documents a non-thinking configuration, which is the lever you want for low-latency pipelines.
–For tagging and other narrow extraction tasks, benchmark thinking and non-thinking modes separately; otherwise you’re measuring reasoning budget, not just model quality.
–If latency is the blocker, GPT-OSS-120B is probably the safer fast-path default, while Qwen3.5 stays the better choice for harder prompts.

// TAGS

qwen3.5-122b-a10bgpt-oss-120bllmreasoninginferencebenchmarkopen-weights

DISCOVERED

111d ago

2026-03-24

PUBLISHED

111d ago

2026-03-23

RELEVANCE

7/ 10

AUTHOR

florinandrei

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE19m ago

Inference optimizations boost GPT-5.6 Sol usage limits

Recent updates for Codex and ChatGPT Work have introduced inference optimizations, the savings of which are being passed directly to users. This results in approximately 10% more usage for all GPT-5.6 Sol subscriptions, with an emphasis on providing improvements without any feature restrictions.

UPDATE1h ago

Claude Code ignores admin SCIM plugin policies

An enterprise user highlighted a critical gap where marketplace plugin selection policies configured in the Claude Admin panel and mapped to SCIM groups do not sync or apply to Claude Code. This limitation breaks the centralized context administration model for organizations attempting broad, secure deployments of Claude across developer environments, as the CLI continues to rely on localized configuration controls instead of real-time organization policies.

VIDEO1h ago

Hookdeck tames webhook chaos, powers event-driven architectures

Better Stack Podcast episode 17 explores event-driven architectures, webhook chaos, and how AI agents change event handling. Hookdeck is highlighted as an Event Gateway designed to reliably queue, secure, and manage asynchronous webhooks and events.