OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
Qwen3.6-35B-A3B Q4_K_M posts solid CPU evals
This is a CPU-only evaluation of the Q4_K_M GGUF quantization of Qwen3.6-35B-A3B, run via `llama-cpp-python` on 32 vCPUs and 125 GB RAM. Across 1,264 samples, it scored 47.56% on HumanEval, 74.30% on HellaSwag, and 46.00% on BFCL, with reported throughput of 22 tokens/sec.
// ANALYSIS
The headline result is that a 35B/3B MoE model stays quite usable after quantization on CPU, but the weaknesses show up exactly where you’d expect: precise code and tool-use tasks degrade faster than broad commonsense reasoning.
- –22 tokens/sec on a no-GPU host is the real story here; this is practical enough for local experimentation and some interactive workflows.
- –HellaSwag at 74.3% suggests the quant preserves general language competence better than task-specific exactness.
- –HumanEval at 47.56% and BFCL at 46% imply that quantization still hurts structured output and function-calling reliability.
- –The setup matters: this is a realistic local deployment path, not a cherry-picked GPU benchmark.
- –The eval is useful as a deployment signal for people deciding whether Qwen3.6-35B-A3B is viable on commodity CPU infrastructure.
// TAGS
qwen3.6-35b-a3bbenchmarkinferencellmopen-weightsself-hostedai-coding
DISCOVERED
3h ago
2026-04-18
PUBLISHED
5h ago
2026-04-18
RELEVANCE
9/ 10
AUTHOR
gvij