YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Strix Halo hits 19 tok/s on Qwen3.5-397B

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Strix Halo hits 19 tok/s on Qwen3.5-397B
OPEN LINK ↗
// 64d agoBENCHMARK RESULT

Strix Halo hits 19 tok/s on Qwen3.5-397B

A breakthrough configuration for AMD's Ryzen AI Max+ 395 (Strix Halo) enables the massive 397B Qwen3.5 MoE model to run at 17-19 tokens/second on a single integrated GPU. By bypassing ROCm's 60GB memory allocation limits and driver instabilities in favor of the open-source Mesa RADV Vulkan driver, users can successfully offload all 61 model layers to the 128GB unified memory pool, achieving nearly triple the performance of Windows-based HIP setups.

// ANALYSIS

Vulkan is the surprise hero for AMD's compute future, proving that open-source graphics drivers can outshine official compute stacks in stability and throughput for local LLM inference. This configuration bypasses the critical 60GB hipMalloc limit on Windows and persistent ROCm segfaults on the gfx1151 architecture, leveraging 128GB LPDDR5X unified memory to turn a $2,500 consumer chip into a viable alternative to multi-H100 setups. It demonstrates that iGPUs are finally capable of high-speed inference on 300B+ parameter models when correctly optimized, though it requires specific Linux kernel tuning such as ttm.pages_limit adjustments to unlock the full potential of the integrated Radeon 8060S GPU.

// TAGS
qwen3.5-397b-a17bqwen-3.5strix-halovulkanllama-cppllmgpuinferenceopen-source

DISCOVERED

64d ago

2026-03-25

PUBLISHED

64d ago

2026-03-24

RELEVANCE

8/ 10

AUTHOR

ricraycray