BACK_TO_FEEDAICRIER_2
Qwen3.5-35B quants split ROCm, Vulkan
OPEN_SOURCE ↗
REDDIT · REDDIT// 36d agoBENCHMARK RESULT

Qwen3.5-35B quants split ROCm, Vulkan

A fresh r/LocalLLaMA benchmark on dual MI50 32GB cards compares Qwen3.5-35B-A3B quant speeds in llama.cpp across ROCm and Vulkan. Vulkan wins prompt processing, ROCm wins token generation, and Q4_0/Q4_1 still come out as the fastest quant options overall.

// ANALYSIS

This is the kind of benchmark local AI developers actually care about: not abstract model hype, but concrete backend tradeoffs on real AMD hardware.

  • Vulkan starts out clearly ahead on prompt ingestion, then converges toward ROCm as context grows
  • ROCm holds a consistent lead on token generation, which matters more for long interactive sessions
  • Q4_0 and Q4_1 staying on top suggests older, leaner quant formats still dominate when pure speed is the goal
  • The gap between bartowski's IQ4_NL and Unsloth's UD-IQ4_NL shows quant packaging and implementation details can materially affect throughput
  • Flash attention looking universally faster here is a useful practical takeaway for llama.cpp users tuning MI50 setups
// TAGS
qwen3-5-35b-a3bllmbenchmarkgpuinferenceopen-source

DISCOVERED

36d ago

2026-03-07

PUBLISHED

36d ago

2026-03-06

RELEVANCE

7/ 10

AUTHOR

OUT_OF_HOST_MEMORY