OPEN_SOURCE ↗
REDDIT · REDDIT// 36d agoBENCHMARK RESULT
Qwen3.5-35B quants split ROCm, Vulkan
A fresh r/LocalLLaMA benchmark on dual MI50 32GB cards compares Qwen3.5-35B-A3B quant speeds in llama.cpp across ROCm and Vulkan. Vulkan wins prompt processing, ROCm wins token generation, and Q4_0/Q4_1 still come out as the fastest quant options overall.
// ANALYSIS
This is the kind of benchmark local AI developers actually care about: not abstract model hype, but concrete backend tradeoffs on real AMD hardware.
- –Vulkan starts out clearly ahead on prompt ingestion, then converges toward ROCm as context grows
- –ROCm holds a consistent lead on token generation, which matters more for long interactive sessions
- –Q4_0 and Q4_1 staying on top suggests older, leaner quant formats still dominate when pure speed is the goal
- –The gap between bartowski's IQ4_NL and Unsloth's UD-IQ4_NL shows quant packaging and implementation details can materially affect throughput
- –Flash attention looking universally faster here is a useful practical takeaway for llama.cpp users tuning MI50 setups
// TAGS
qwen3-5-35b-a3bllmbenchmarkgpuinferenceopen-source
DISCOVERED
36d ago
2026-03-07
PUBLISHED
36d ago
2026-03-06
RELEVANCE
7/ 10
AUTHOR
OUT_OF_HOST_MEMORY