OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoMODEL RELEASE
Qwen3.6-35B-A3B pushes local coding limits
A LocalLLaMA user asks whether any other Qwen-tier local models will fit a Ryzen 9 5980HX, RX 6800M, and 16 GB RAM box better than Qwen3.6-35B-A3B-Q4_K_M, which already runs at about 17 t/s in llama.cpp over Vulkan. The practical question is whether a smaller sibling can deliver better coding quality per token without breaking the local workflow.
// ANALYSIS
35B sparse MoE is a strong result for that hardware, but the more interesting candidate is probably the 27B dense sibling rather than a bigger quant.
- –The reported 17 t/s suggests the machine is already usable for local inference, but 65k context and large KV cache settings will matter as much as raw parameter count.
- –Product Hunt frames Qwen3.6-27B as the sweet-spot open dense model for coding agents, and Qwen’s own blog positions it as flagship-level coding in a 27B dense checkpoint.
- –For lightweight coding, dense models tend to feel more predictable than MoE checkpoints once memory bandwidth and context pressure start to dominate.
- –The real test is code completion, refactoring, and instruction-following quality, not just throughput or VRAM fit.
// TAGS
qwen3qwen3-6-35b-a3bllmopen-weightsmoequantizationai-codingcoding-agent
DISCOVERED
1d ago
2026-05-01
PUBLISHED
1d ago
2026-05-01
RELEVANCE
9/ 10
AUTHOR
Houston_NeverMind