OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoBENCHMARK RESULT
LocalAI Qwen 3.5 35B benchmark: Vulkan wins
LocalAI benchmarked Qwen 3.5 35B MoE variants on Strix Halo and found a clean split: Vulkan led token generation, while ROCm won prompt processing. The tests stretched from zero context to 200K tokens with prefix caching enabled, which makes this a useful read for anyone tuning local inference on AMD hardware.
// ANALYSIS
This is less a universal backend verdict than a workload split, and that matters. If your app is chatty and decode-heavy, Vulkan looks like the better default; if your pipeline is prompt-heavy, ROCm still has an edge.
- –The result held across two different Qwen3.5-35B variants, so the backend pattern looks real rather than quant-specific noise.
- –Vulkan’s roughly 10-15% generation lead is the number to watch, because token streaming is what users feel in interactive sessions.
- –ROCm’s prompt-processing advantage is still meaningful for ingestion, long-context preprocessing, and batch-style local workflows.
- –Prefix caching and 200K-context tests make this relevant for agentic use cases, not just short chat prompts.
- –For Strix Halo and similar AMD APUs, backend choice should now be workload-specific instead of assumed.
// TAGS
localaillmgpuinferencebenchmarkself-hosted
DISCOVERED
3d ago
2026-04-08
PUBLISHED
4d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
pipould