YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B benchmark tops AWQ Q4

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B benchmark tops AWQ Q4
OPEN LINK ↗
// 1h agoBENCHMARK RESULT

Qwen3.6-35B-A3B benchmark tops AWQ Q4

A local benchmark on Qwen3.6-35B-A3B found FP8 + MTP outperforming AWQ Q4 across serial and concurrent decode, with better latency at higher concurrency. The result suggests weight quantization alone is not a reliable proxy for real serving speed.

// ANALYSIS

The interesting part here is that the serving stack matters as much as the weight format. Once MTP and other runtime optimizations enter the picture, a “heavier” precision setup can still beat a lower-bit quantized one.

  • Serial decode came out at 110 tok/s for FP8 + MTP versus 91.8 tok/s for AWQ Q4
  • At concurrency 4, FP8 + MTP cleared 400+ tok/s while Q4 landed at 248 tok/s
  • At concurrency 8, FP8 + MTP hit 484 tok/s versus 250 tok/s for Q4
  • p90 latency at concurrency 8 improved from about 5.9s to about 3.4s
  • The comparison is not perfectly apples-to-apples because the Q4 setup lacked EP and MTP, which likely explains a lot of the gap
// TAGS
qwen3.6-35b-a3bllmquantizationinferencebenchmarkai-codingcoding-agentopen-weights

DISCOVERED

1h ago

2026-05-08

PUBLISHED

3h ago

2026-05-08

RELEVANCE

8/ 10

AUTHOR

Motor_Match_621