Qwen3.6-35B-A3B benchmark tops AWQ Q4

// 1h agoBENCHMARK RESULT

Qwen3.6-35B-A3B benchmark tops AWQ Q4

A local benchmark on Qwen3.6-35B-A3B found FP8 + MTP outperforming AWQ Q4 across serial and concurrent decode, with better latency at higher concurrency. The result suggests weight quantization alone is not a reliable proxy for real serving speed.

// ANALYSIS

The interesting part here is that the serving stack matters as much as the weight format. Once MTP and other runtime optimizations enter the picture, a “heavier” precision setup can still beat a lower-bit quantized one.

–Serial decode came out at 110 tok/s for FP8 + MTP versus 91.8 tok/s for AWQ Q4
–At concurrency 4, FP8 + MTP cleared 400+ tok/s while Q4 landed at 248 tok/s
–At concurrency 8, FP8 + MTP hit 484 tok/s versus 250 tok/s for Q4
–p90 latency at concurrency 8 improved from about 5.9s to about 3.4s
–The comparison is not perfectly apples-to-apples because the Q4 setup lacked EP and MTP, which likely explains a lot of the gap

// TAGS

qwen3.6-35b-a3bllmquantizationinferencebenchmarkai-codingcoding-agentopen-weights

DISCOVERED

1h ago

2026-05-08

PUBLISHED

3h ago

2026-05-08

RELEVANCE

8/ 10

AUTHOR

Motor_Match_621

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA41m ago

Alchemy binds infra to tools

Alchemy is leaning into a TypeScript-first model where infrastructure resources are created and bound directly in code, including Cloudflare R2 objects for tool-owned storage. The pitch is that AI tools and app logic can manage durable cloud state without a separate ops layer.

OPEN SOURCE1h ago

Alchemy v2 agent self-deploys, self-modifies

Alchemy v2 folds prompt, tools, and resources into one composable unit, with an agent that can deploy and modify its own stack. The pitch is a type-safe path from intelligence to infrastructure, built on Effect and Alchemy’s resource graph.

LAUNCH2h ago

Sisyphus Labs teases voice-led Dori

Sisyphus Labs is teasing Dori as an AI maintainer for large codebases, with a voice-first pitch that lets you tell him what to do across GitHub, Slack, email, and coding agent sessions. The site says Dori fans out work across three Codex sessions in parallel and only pings after the merge.