REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Qwen3.6-27B codes well, crawls past 52K tokens

This field report tests Qwen3.6-27B, running the 4-bit IQ4_XS Unsloth quant through llama-server on an M2 MacBook Pro with 32GB RAM. The author says the model produced excellent code in OpenCode, but throughput collapsed as context grew from 80 tok/s prompt processing and 7.9 tok/s generation to just 4 tok/s prompt processing and 3.1 tok/s generation at roughly 52,000 tokens. No swapping was observed, so the slowdown appears to be memory-bandwidth bound rather than a RAM-capacity problem. Speculative ngram-mod decoding did not seem to help materially.

// ANALYSIS

Hot take: this reads like a reminder that a dense 27B model can be genuinely useful locally, but the hardware ceiling is real and ugly.

–The model quality appears high enough to justify the pain: the generated code was described as excellent even with no extra steering after the initial prompt.
–The bottleneck is likely bandwidth, not memory pressure, which makes this a hardware-fit story more than a software-tuning story.
–The ngram-mod speculative setup seems to have added complexity without clear payoff in this workload.
–The author’s “slow but effective self-hosted Sonnet” framing is plausible for dense 27B-class models.
–This is a practical report, not a synthetic benchmark: the value is in the real agentic coding behavior under long context.

// TAGS

qwenqwen3-6-27b27blocal-llmmacbook-proapple-siliconllama.cppquantizationcoding-agentsspeculative-decoding

DISCOVERED

4h ago

2026-04-29

PUBLISHED

7h ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

boutell