Strix Halo 128GB hits tool-calling walls

// 100d agoNEWS

Strix Halo 128GB hits tool-calling walls

Developers using AMD's high-memory Strix Halo (Ryzen AI Max) hardware are reporting significant tool-calling failures with the Qwen3-Coder-Next model during agentic workflows. Despite the model's massive 256k native context window, users frequently encounter "failed tool calling loops" once the KV cache exceeds 20,000 tokens, specifically when attempting file-write operations in local agents like OpenCode.

// ANALYSIS

The 128GB unified memory on Strix Halo is a breakthrough for local AI, but software-side quantization is currently the bottleneck for reliable agentic coding at scale.

–4-bit quantization (GGUF/EXL2) likely degrades the high-precision attention required for multi-file editing as context density increases beyond 20k tokens.
–AMD's "Lemonade" server provides essential ROCm optimization for the RDNA 3.5 iGPU, but logic stability remains a model-side compression issue.
–Users with 128GB of RAM should pivot to 8-bit (Q8_0) variants of 70B+ models, which fit comfortably and offer the stability needed for long-context tool calling.
–The failure in "Next-Coder" variants suggests that "agentic training" in MoE architectures may still struggle with the high-variance feedback loops of local terminal environments.
–Switching to agents with "Thinking Mode" support or git-verified workflows like Aider can help mitigate the risks of unrecoverable model loops.

// TAGS

qwen3-coder-nextqwenai-codingagenthardwaregpullmopen-sourceamd

DISCOVERED

100d ago

2026-04-04

PUBLISHED

100d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

Fireforce008

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL30m ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.

UPDATE1h ago

OpenRouter splits rankings by model weight

OpenRouter has updated its rankings platform by introducing separate leaderboards for open-weight and closed-weight models. This allows developers to track and compare usage statistics of proprietary, API-exclusive models against downloadable open-weight models.

UPDATE1h ago

Codex and Claude Code introduce advanced in-app browser capabilities, including multi-tab support and cookie imports, accelerating the shift toward autonomous computer use.

Codex has updated its in-app browser to support multiple tabs, cookie importing, and password persistence, with Anthropic's Claude Code quickly following with similar web-browsing capabilities. These upgrades allow AI agents to navigate authenticated sites and perform browser-based tasks alongside code editors and terminals. By embedding robust browser control directly into the agentic environment, developers can execute end-to-end workflows without leaving the command line or workspace app.