Strix Halo makes 122B local inference usable

// 80d agoBENCHMARK RESULT

Strix Halo makes 122B local inference usable

A LocalLLaMA benchmark run shows AMD's Ryzen AI Max+ 395 with Radeon 8060S graphics, 128GB unified memory, ROCm 7.2, and llama.cpp pushing far past the usual iGPU ceiling. The standout result is roughly 21 tok/s generation on a 122B Qwen3.5 MoE quant, alongside nearly 6,000 tokens/s prompt processing on Qwen3.5-0.8B Q4.

// ANALYSIS

This is the kind of benchmark that makes local AI hardware feel materially different, not just incrementally faster. Strix Halo is starting to look like a legitimate single-box inference platform for developers who want to run huge GGUF models without jumping to a discrete GPU rig.

–The real headline is not the tiny-model speed, but that a 122B-class quant is actually usable on integrated graphics with unified memory.
–MoE models look especially strong here, with Qwen3.5-35B-A3B and GLM-4.7-Flash posting numbers that make local experimentation much more practical.
–ROCm 7.2 appears to be a meaningful step for AMD's local LLM story, especially versus the usual perception that CUDA is the only serious path.
–For AI developers, this points to a new sweet spot between laptop-class convenience and workstation-class local inference capacity.
–The benchmark is still a community Reddit post, not a vendor-run validation, but the results line up with the growing interest in Strix Halo as a local-LLM box.

// TAGS

amd-ryzen-ai-max-plus-395llmgpuinferencebenchmark

DISCOVERED

80d ago

2026-03-09

PUBLISHED

80d ago

2026-03-09

RELEVANCE

8/ 10

AUTHOR

przbadu

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS27m ago

ElevenLabs, Greece partner on voice AI gov services

ElevenLabs signed a Memorandum of Understanding with the Greek government to integrate voice AI into the gov.gr portal, automate public service call centers, and preserve regional dialects like Cretan. The initiative aims to modernize bureaucracy and tourism through natural language interaction and linguistic heritage preservation.

VIDEO1h ago

Mistral Vibe wires connectors into CLI workflows

Mistral Vibe’s connector layer lets the terminal agent reach into external services from one workflow. The demo shows it reading requirements, editing code, opening a GitHub PR, and updating Linear without leaving the CLI.

NEWS3h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.