Strix Halo hits 19 tok/s on Qwen3.5-397B

// 110d agoBENCHMARK RESULT

Strix Halo hits 19 tok/s on Qwen3.5-397B

A breakthrough configuration for AMD's Ryzen AI Max+ 395 (Strix Halo) enables the massive 397B Qwen3.5 MoE model to run at 17-19 tokens/second on a single integrated GPU. By bypassing ROCm's 60GB memory allocation limits and driver instabilities in favor of the open-source Mesa RADV Vulkan driver, users can successfully offload all 61 model layers to the 128GB unified memory pool, achieving nearly triple the performance of Windows-based HIP setups.

// ANALYSIS

Vulkan is the surprise hero for AMD's compute future, proving that open-source graphics drivers can outshine official compute stacks in stability and throughput for local LLM inference. This configuration bypasses the critical 60GB hipMalloc limit on Windows and persistent ROCm segfaults on the gfx1151 architecture, leveraging 128GB LPDDR5X unified memory to turn a $2,500 consumer chip into a viable alternative to multi-H100 setups. It demonstrates that iGPUs are finally capable of high-speed inference on 300B+ parameter models when correctly optimized, though it requires specific Linux kernel tuning such as ttm.pages_limit adjustments to unlock the full potential of the integrated Radeon 8060S GPU.

// TAGS

qwen3.5-397b-a17bqwen-3.5strix-halovulkanllama-cppllmgpuinferenceopen-source

DISCOVERED

110d ago

2026-03-25

PUBLISHED

110d ago

2026-03-24

RELEVANCE

8/ 10

AUTHOR

ricraycray

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE34m ago

Perplexity Computer integrates Grok 4.5

Perplexity has integrated xAI's Grok 4.5 as the orchestrator for Perplexity Computer, achieving a top score of 0.328 on its internal WANDR benchmark. The integration is highly cost-effective, running at approximately half the cost of Anthropic's Claude Opus 4.8.

UPDATE45m ago

Inference optimizations boost GPT-5.6 Sol usage limits

Recent updates for Codex and ChatGPT Work have introduced inference optimizations, the savings of which are being passed directly to users. This results in approximately 10% more usage for all GPT-5.6 Sol subscriptions, with an emphasis on providing improvements without any feature restrictions.

UPDATE1h ago

Claude Code ignores admin SCIM plugin policies

An enterprise user highlighted a critical gap where marketplace plugin selection policies configured in the Claude Admin panel and mapped to SCIM groups do not sync or apply to Claude Code. This limitation breaks the centralized context administration model for organizations attempting broad, secure deployments of Claude across developer environments, as the CLI continues to rely on localized configuration controls instead of real-time organization policies.