Qwen3.6 Q2_K_XL runs fast, freezes
This Reddit report says Unsloth's Qwen3.6-35B-A3B UD Q2_K_XL quant can reach 30-40 tk/s through Claude Code on a consumer desktop, but some 5070 Ti setups freeze the PC or graphics driver as prompt processing ends. The poster ties the failures to offload and larger contexts, suggesting a very fast but fragile local inference setup.
Strong model, but this reads less like a “model is broken” complaint and more like a very aggressive local deployment hitting the edge of consumer hardware stability.
- –The win here is obvious: the poster is getting much better coding behavior than with a smaller Qwen3.5 9B quant.
- –The failure mode is also telling: freezing after prompt completion usually points to VRAM pressure, driver instability, or offload/memory-management issues rather than pure token-generation slowdown.
- –Q2_K_XL is attractive for speed/size, but at this scale it may simply be too fragile for some desktop GPU + display setups.
- –Net: high upside, but this is the kind of setup that needs careful tuning and may still be unreliable in practice.
DISCOVERED
45d ago
2026-04-18
PUBLISHED
45d ago
2026-04-18
RELEVANCE
AUTHOR
AcrobaticChain1846