BACK_TO_FEEDAICRIER_2
Strix Halo hits 19 tok/s on Qwen3.5-397B
OPEN_SOURCE ↗
REDDIT · REDDIT// 18d agoBENCHMARK RESULT

Strix Halo hits 19 tok/s on Qwen3.5-397B

A breakthrough configuration for AMD's Ryzen AI Max+ 395 (Strix Halo) enables the massive 397B Qwen3.5 MoE model to run at 17-19 tokens/second on a single integrated GPU. By bypassing ROCm's 60GB memory allocation limits and driver instabilities in favor of the open-source Mesa RADV Vulkan driver, users can successfully offload all 61 model layers to the 128GB unified memory pool, achieving nearly triple the performance of Windows-based HIP setups.

// ANALYSIS

Vulkan is the surprise hero for AMD's compute future, proving that open-source graphics drivers can outshine official compute stacks in stability and throughput for local LLM inference. This configuration bypasses the critical 60GB hipMalloc limit on Windows and persistent ROCm segfaults on the gfx1151 architecture, leveraging 128GB LPDDR5X unified memory to turn a $2,500 consumer chip into a viable alternative to multi-H100 setups. It demonstrates that iGPUs are finally capable of high-speed inference on 300B+ parameter models when correctly optimized, though it requires specific Linux kernel tuning such as ttm.pages_limit adjustments to unlock the full potential of the integrated Radeon 8060S GPU.

// TAGS
qwen3.5-397b-a17bqwen-3.5strix-halovulkanllama-cppllmgpuinferenceopen-source

DISCOVERED

18d ago

2026-03-25

PUBLISHED

18d ago

2026-03-24

RELEVANCE

8/ 10

AUTHOR

ricraycray