BACK_TO_FEEDAICRIER_2
Qwen3.6 Q2_K_XL runs fast, freezes
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE

Qwen3.6 Q2_K_XL runs fast, freezes

This Reddit report says Unsloth's Qwen3.6-35B-A3B UD Q2_K_XL quant can reach 30-40 tk/s through Claude Code on a consumer desktop, but some 5070 Ti setups freeze the PC or graphics driver as prompt processing ends. The poster ties the failures to offload and larger contexts, suggesting a very fast but fragile local inference setup.

// ANALYSIS

Strong model, but this reads less like a “model is broken” complaint and more like a very aggressive local deployment hitting the edge of consumer hardware stability.

  • The win here is obvious: the poster is getting much better coding behavior than with a smaller Qwen3.5 9B quant.
  • The failure mode is also telling: freezing after prompt completion usually points to VRAM pressure, driver instability, or offload/memory-management issues rather than pure token-generation slowdown.
  • Q2_K_XL is attractive for speed/size, but at this scale it may simply be too fragile for some desktop GPU + display setups.
  • Net: high upside, but this is the kind of setup that needs careful tuning and may still be unreliable in practice.
// TAGS
qwenunslothggufquantizationlocal-llmclaude-codemoe

DISCOVERED

6h ago

2026-04-18

PUBLISHED

7h ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

AcrobaticChain1846