Qwen3.6-35B-A3B hits 21.7 tok/s on consumer GPUs

// 56d agoBENCHMARK RESULT

Qwen3.6-35B-A3B hits 21.7 tok/s on consumer GPUs

Local LLM benchmarks reveal that Qwen 3.6-35B-A3B, a sparse Mixture-of-Experts model, achieves 21.7 tokens/second on dual RTX 5060 Ti GPUs using hybrid offloading. The model successfully bridges the gap between high parameter counts and consumer hardware, excelling in agentic coding tasks with a 73.4% SWE-bench Verified score.

// ANALYSIS

Sparse MoE architectures are making high-end reasoning viable on consumer-grade setups, though prompt processing remains a significant bottleneck compared to dense models.

–Hybrid offloading (--cpu-moe) provides "free" performance gains by offloading inactive experts to system RAM without sacrificing generation speed.
–The model shows a major reasoning leap, outperforming the Qwen 3.5 dense variant by a substantial margin in agentic benchmarks like Terminal-Bench 2.0.
–PCIe bandwidth limits ingestion efficiency, leaving dense models with a nearly 2x advantage in prompt processing speeds.
–Technical stability remains a challenge; current custom llama.cpp builds crash when combining Gated Delta Net optimizations with hybrid offloading.
–Real-world tests confirm autonomous reliability, with the model successfully completing multi-step tool calls for infrastructure automation.

// TAGS

qwen3.6-35b-a3bllmbenchmarkgpuinferenceopen-weightsreasoningai-coding

DISCOVERED

56d ago

2026-04-17

PUBLISHED

56d ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

Defilan

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE32m ago

Music Assistant is an open-source media library manager that unifies your music streaming services and smart speakers into a single self-hosted server.

Music Assistant functions as a centralized hub for your music library, connecting various streaming services and local media directories to a wide range of smart speakers. Designed to run continuously on low-power devices like a Raspberry Pi, NAS, or Intel NUC, the server aggregates searches, manages playlists, and orchestrates multi-room synchronized audio playback across disparate hardware systems.

NEWS35m ago

Developers compare Claude Fable 5 and ChatGPT 5.5

A social media inquiry by @droidbuilds asks the developer community to identify distinct capabilities that Anthropic's new "Mythos-class" Claude Fable 5 model can perform which OpenAI's ChatGPT 5.5 cannot. This query highlights the intense competition and performance comparisons between the latest frontier AI models as developers benchmark their long-horizon agentic workflows, complex coding tasks, and multi-step execution.

UPDATE51m ago

The developer of Clawd.rip updated the site to include a dark mode and individual pages with help from Anthropic's new Fable 5 model just before access to the model was suspended by US national security orders.

The creator of Clawd.rip, an independent website tracking controversies and issues surrounding Anthropic's Claude AI model, updated the platform to add individual page routing and a keyboard-triggered dark mode ("d"). The update was developed using Anthropic's recently released frontier model, Claude Fable 5, but the launch celebration was cut short by a sudden U.S. government export control directive. On June 12, 2026, Anthropic suspended access to Fable 5 for all customers due to national security concerns regarding potential safety bypass vulnerabilities, making the update a reminder of the model's fleeting existence.