Qwen3.5-9B hits performance wall in llama.cpp

// 60d agoMODEL RELEASE

Qwen3.5-9B hits performance wall in llama.cpp

ANNOUNCEMENT PRODUCT GITHUB PRODUCT HUNT

Users report significantly lower throughput for the newly released Qwen3.5-9B model compared to predecessors, likely due to its hybrid architecture and unoptimized inference settings in popular local engines like llama.cpp.

// ANALYSIS

Qwen3.5's "thinking" capabilities and hybrid Gated DeltaNet/MoE architecture are currently outstripping local optimization efforts, causing a performance "cliff" on consumer hardware.

–Architectural complexity (Gated Delta Networks) requires specific llama.cpp updates that are still maturing, leading to high CPU overhead and low GPU utilization.
–Default "reasoning" modes add significant token overhead, making the model feel slower than dense 8B-9B counterparts despite superior benchmark scores.
–High VRAM usage for the MoE layers often triggers silent system memory fallbacks on 16GB cards, slashing speeds by up to 70%.
–Early optimization fixes include reducing --ubatch-size to match GPU cache and explicitly disabling the reasoning budget for standard chat tasks.

// TAGS

qwen3.5-9bllmai-codingreasoningopen-sourceinferencegpu

DISCOVERED

60d ago

2026-04-10

PUBLISHED

60d ago

2026-04-10

RELEVANCE

9/ 10

AUTHOR

soyalemujica

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS28m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL1h ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL1h ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.