OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE
Qwen3.6 Q2_K_XL runs fast, freezes
This Reddit report says Unsloth's Qwen3.6-35B-A3B UD Q2_K_XL quant can reach 30-40 tk/s through Claude Code on a consumer desktop, but some 5070 Ti setups freeze the PC or graphics driver as prompt processing ends. The poster ties the failures to offload and larger contexts, suggesting a very fast but fragile local inference setup.
// ANALYSIS
Strong model, but this reads less like a “model is broken” complaint and more like a very aggressive local deployment hitting the edge of consumer hardware stability.
- –The win here is obvious: the poster is getting much better coding behavior than with a smaller Qwen3.5 9B quant.
- –The failure mode is also telling: freezing after prompt completion usually points to VRAM pressure, driver instability, or offload/memory-management issues rather than pure token-generation slowdown.
- –Q2_K_XL is attractive for speed/size, but at this scale it may simply be too fragile for some desktop GPU + display setups.
- –Net: high upside, but this is the kind of setup that needs careful tuning and may still be unreliable in practice.
// TAGS
qwenunslothggufquantizationlocal-llmclaude-codemoe
DISCOVERED
6h ago
2026-04-18
PUBLISHED
7h ago
2026-04-18
RELEVANCE
8/ 10
AUTHOR
AcrobaticChain1846