Qwen3.5 Benchmarks Split ROCm, Vulkan
Independent tests of Qwen3.5 on an M5 Max MacBook Pro, an M1 Max Mac Studio, and AMD Radeon GPUs compared MLX, ROCm, and AMDVLK across single- and dual-GPU setups. The punchline is workload shape: Vulkan wins most single-GPU generation, ROCm dominates dense-model prefill, and the 122B dual-GPU case flips back to ROCm.
Backend choice matters more than raw GPU class here. Qwen3.5 flips winners depending on whether you're in prefill, decode, or multi-GPU mode.
- –AMDVLK Vulkan beat ROCm on single-GPU generation for the 35B MoE and 27B dense models, with the biggest gains on the lightest active-compute path.
- –ROCm dominated 27B prompt processing by 3.5-4x, which makes it the better pick for long-context RAG, summarization, and document analysis.
- –The 122B dual-GPU run reverses the smaller-model pattern: ROCm wins both prefill and decode, so multi-GPU coordination finally pays for itself.
- –M5 Max looks genuinely strong for local inference because unified memory lets it dominate prefill without PCIe baggage.
- –W6800 is the cautionary tale: RDNA 2, ROCm incompatibility, and chipset x4 bandwidth can swamp any apples-to-apples GPU comparison.
DISCOVERED
62d ago
2026-03-26
PUBLISHED
62d ago
2026-03-26
RELEVANCE
AUTHOR
neuromacmd