BACK_TO_FEEDAICRIER_2
Qwen3.5-35B Matches Q4 on MI50s
OPEN_SOURCE ↗
REDDIT · REDDIT// 5d agoBENCHMARK RESULT

Qwen3.5-35B Matches Q4 on MI50s

On dual AMD MI50s, a community benchmark says Qwen3.5-35B-A3B at Q8_0 hits 55 tok/s generation and 1100 tok/s prefill, nearly matching a Q4_K_XL run. The post suggests older AMD hardware and software overhead are flattening the expected speedup from heavier quantization.

// ANALYSIS

This reads less like a surprise model win and more like a reminder that local inference performance is often limited by kernels, memory movement, and device topology, not just bit width.

  • Q8_0 keeping pace with Q4_K_XL on generation suggests the bottleneck is not purely arithmetic throughput.
  • The prefill jump on two GPUs shows where parallelism still matters: prompt processing benefits far more than token-by-token decoding.
  • MI50-era AMD cards are exactly where inference stacks tend to be least polished, so quantization gains can get swallowed by software inefficiency.
  • For local model runners, this is a useful reminder to benchmark `prefill` and `decode` separately before choosing a quant level.
  • Qwen3.5-35B-A3B still looks attractive for multi-GPU local deployment, especially if you can afford a higher-quality quant without losing real-world speed.
// TAGS
qwen3.5-35b-a3bllmbenchmarkgpuinferenceopen-source

DISCOVERED

5d ago

2026-04-06

PUBLISHED

5d ago

2026-04-06

RELEVANCE

8/ 10

AUTHOR

Far-Low-4705