OPEN_SOURCE ↗
REDDIT · REDDIT// 7d agoNEWS
Strix Halo 128GB hits tool-calling walls
Developers using AMD's high-memory Strix Halo (Ryzen AI Max) hardware are reporting significant tool-calling failures with the Qwen3-Coder-Next model during agentic workflows. Despite the model's massive 256k native context window, users frequently encounter "failed tool calling loops" once the KV cache exceeds 20,000 tokens, specifically when attempting file-write operations in local agents like OpenCode.
// ANALYSIS
The 128GB unified memory on Strix Halo is a breakthrough for local AI, but software-side quantization is currently the bottleneck for reliable agentic coding at scale.
- –4-bit quantization (GGUF/EXL2) likely degrades the high-precision attention required for multi-file editing as context density increases beyond 20k tokens.
- –AMD's "Lemonade" server provides essential ROCm optimization for the RDNA 3.5 iGPU, but logic stability remains a model-side compression issue.
- –Users with 128GB of RAM should pivot to 8-bit (Q8_0) variants of 70B+ models, which fit comfortably and offer the stability needed for long-context tool calling.
- –The failure in "Next-Coder" variants suggests that "agentic training" in MoE architectures may still struggle with the high-variance feedback loops of local terminal environments.
- –Switching to agents with "Thinking Mode" support or git-verified workflows like Aider can help mitigate the risks of unrecoverable model loops.
// TAGS
qwen3-coder-nextqwenai-codingagenthardwaregpullmopen-sourceamd
DISCOVERED
7d ago
2026-04-04
PUBLISHED
7d ago
2026-04-04
RELEVANCE
8/ 10
AUTHOR
Fireforce008