REDDIT · REDDIT// 1d agoMODEL RELEASE

Qwen3.6-35B-A3B pushes local coding limits

A LocalLLaMA user asks whether any other Qwen-tier local models will fit a Ryzen 9 5980HX, RX 6800M, and 16 GB RAM box better than Qwen3.6-35B-A3B-Q4_K_M, which already runs at about 17 t/s in llama.cpp over Vulkan. The practical question is whether a smaller sibling can deliver better coding quality per token without breaking the local workflow.

// ANALYSIS

35B sparse MoE is a strong result for that hardware, but the more interesting candidate is probably the 27B dense sibling rather than a bigger quant.

–The reported 17 t/s suggests the machine is already usable for local inference, but 65k context and large KV cache settings will matter as much as raw parameter count.
–Product Hunt frames Qwen3.6-27B as the sweet-spot open dense model for coding agents, and Qwen’s own blog positions it as flagship-level coding in a 27B dense checkpoint.
–For lightweight coding, dense models tend to feel more predictable than MoE checkpoints once memory bandwidth and context pressure start to dominate.
–The real test is code completion, refactoring, and instruction-following quality, not just throughput or VRAM fit.

// TAGS

qwen3qwen3-6-35b-a3bllmopen-weightsmoequantizationai-codingcoding-agent

DISCOVERED

1d ago

2026-05-01

PUBLISHED

1d ago

2026-05-01

RELEVANCE

9/ 10

AUTHOR

Houston_NeverMind