REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Qwen3.6-27B regains 16GB fit, 110k context

By reverting a llama.cpp quantization change, the author trims Qwen3.6-27B IQ4_XS back to 14.7GB and keeps it practical on 16GB VRAM. The custom GGUF benchmarks nearly match stock across 65k and 110k context, so this reads like a real deployment fix rather than a quality tradeoff.

// ANALYSIS

This is a niche patch with outsized impact for local model runners: a 0.4GB packaging change is the difference between “fits comfortably” and “doesn’t fit” on consumer 16GB cards.

–The stock IQ4_XS build is 15.1GB, while the reverted variant lands at 14.7GB, which is enough to preserve the 16GB VRAM use case.
–Perplexity deltas are tiny at both 65k and 110k context, which supports the claim that the attn_qkv rollback restores size without meaningfully hurting quality.
–The KV cache tests suggest Qwen3.6-27B does not benefit much from asymmetric K-heavy tuning, so V-cache matters more than the turboquant_plus guidance would imply.
–The Q3 comparison weakens the “just drop to Q3” argument for coding workflows, since the smaller model still gives up some quality for long-context use.

// TAGS

qwen3.6-27bllmbenchmarkopen-sourceinference

DISCOVERED

3h ago

2026-04-28

PUBLISHED

7h ago

2026-04-28

RELEVANCE

9/ 10

AUTHOR

Pablo_the_brave