BACK_TO_FEEDAICRIER_2
LocalAI Qwen 3.5 35B benchmark: Vulkan wins
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoBENCHMARK RESULT

LocalAI Qwen 3.5 35B benchmark: Vulkan wins

LocalAI benchmarked Qwen 3.5 35B MoE variants on Strix Halo and found a clean split: Vulkan led token generation, while ROCm won prompt processing. The tests stretched from zero context to 200K tokens with prefix caching enabled, which makes this a useful read for anyone tuning local inference on AMD hardware.

// ANALYSIS

This is less a universal backend verdict than a workload split, and that matters. If your app is chatty and decode-heavy, Vulkan looks like the better default; if your pipeline is prompt-heavy, ROCm still has an edge.

  • The result held across two different Qwen3.5-35B variants, so the backend pattern looks real rather than quant-specific noise.
  • Vulkan’s roughly 10-15% generation lead is the number to watch, because token streaming is what users feel in interactive sessions.
  • ROCm’s prompt-processing advantage is still meaningful for ingestion, long-context preprocessing, and batch-style local workflows.
  • Prefix caching and 200K-context tests make this relevant for agentic use cases, not just short chat prompts.
  • For Strix Halo and similar AMD APUs, backend choice should now be workload-specific instead of assumed.
// TAGS
localaillmgpuinferencebenchmarkself-hosted

DISCOVERED

3d ago

2026-04-08

PUBLISHED

4d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

pipould