BACK_TO_FEEDAICRIER_2
Qwen3.6-35B-A3B Q4_K_M posts solid CPU evals
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Qwen3.6-35B-A3B Q4_K_M posts solid CPU evals

This is a CPU-only evaluation of the Q4_K_M GGUF quantization of Qwen3.6-35B-A3B, run via `llama-cpp-python` on 32 vCPUs and 125 GB RAM. Across 1,264 samples, it scored 47.56% on HumanEval, 74.30% on HellaSwag, and 46.00% on BFCL, with reported throughput of 22 tokens/sec.

// ANALYSIS

The headline result is that a 35B/3B MoE model stays quite usable after quantization on CPU, but the weaknesses show up exactly where you’d expect: precise code and tool-use tasks degrade faster than broad commonsense reasoning.

  • 22 tokens/sec on a no-GPU host is the real story here; this is practical enough for local experimentation and some interactive workflows.
  • HellaSwag at 74.3% suggests the quant preserves general language competence better than task-specific exactness.
  • HumanEval at 47.56% and BFCL at 46% imply that quantization still hurts structured output and function-calling reliability.
  • The setup matters: this is a realistic local deployment path, not a cherry-picked GPU benchmark.
  • The eval is useful as a deployment signal for people deciding whether Qwen3.6-35B-A3B is viable on commodity CPU infrastructure.
// TAGS
qwen3.6-35b-a3bbenchmarkinferencellmopen-weightsself-hostedai-coding

DISCOVERED

3h ago

2026-04-18

PUBLISHED

5h ago

2026-04-18

RELEVANCE

9/ 10

AUTHOR

gvij