Qwen3.6-35B-A3B Q4_K_M posts solid CPU evals

// 90d agoBENCHMARK RESULT

Qwen3.6-35B-A3B Q4_K_M posts solid CPU evals

This is a CPU-only evaluation of the Q4_K_M GGUF quantization of Qwen3.6-35B-A3B, run via `llama-cpp-python` on 32 vCPUs and 125 GB RAM. Across 1,264 samples, it scored 47.56% on HumanEval, 74.30% on HellaSwag, and 46.00% on BFCL, with reported throughput of 22 tokens/sec.

// ANALYSIS

The headline result is that a 35B/3B MoE model stays quite usable after quantization on CPU, but the weaknesses show up exactly where you’d expect: precise code and tool-use tasks degrade faster than broad commonsense reasoning.

–22 tokens/sec on a no-GPU host is the real story here; this is practical enough for local experimentation and some interactive workflows.
–HellaSwag at 74.3% suggests the quant preserves general language competence better than task-specific exactness.
–HumanEval at 47.56% and BFCL at 46% imply that quantization still hurts structured output and function-calling reliability.
–The setup matters: this is a realistic local deployment path, not a cherry-picked GPU benchmark.
–The eval is useful as a deployment signal for people deciding whether Qwen3.6-35B-A3B is viable on commodity CPU infrastructure.

// TAGS

qwen3.6-35b-a3bbenchmarkinferencellmopen-weightsself-hostedai-coding

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

9/ 10

AUTHOR

gvij

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK15m ago

Runway Agent 2.0 tops Arc 1.0 benchmark

Runway detailed its engineering approach for Runway Agent 2.0, a conversational video generation and editing partner that topped Physion Labs' Arc 1.0 benchmark across all categories. The platform integrates media into a timeline interface, letting users iteratively transform briefs or performance data into cinematic video.

MODEL1h ago

Moonshot AI shares Kimi K3 pre-launch look

Ahead of the launch of their Kimi K3 large language model, the team at Chinese AI startup Moonshot AI shared a behind-the-scenes photo of their workspace. The post captures the excitement and high stakes surrounding the release, with team members expressing confidence that their office is a potential birthplace of Artificial General Intelligence (AGI).

NEWS1h ago

Claude Code praised as multi-model orchestrator

A user on X has highlighted Anthropic's Claude Code as the premier agentic harness for orchestrating other models and harnesses, specifically mentioning running GPT-5.6 Sol and Kimi K3. Although the user notes that Claude Code does not win in terms of pure coding performance and efficiency, they find its workflow management and coordination capabilities to be highly valuable for modern developer environments.