OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE
Ryzen AI Max+ 395 hits long-context wall
A Bosgame M5 128GB user on AMD's Strix Halo platform says Claude Code-style document agents feel faster on Vulkan than ROCm, even though ROCm should win prompt processing on paper. The real pain point is long-context work, where performance drops hard once prompts push past roughly 50K tokens.
// ANALYSIS
Strix Halo is powerful enough for local agents, but this thread shows the real bottleneck is backend behavior under long context, not model size. AMD's own docs now position Ryzen AI Max+ 395 for MCP-heavy workflows, yet the software stack still needs tuning before it feels effortless.
- –AMD's ROCm docs say the supported llama.cpp fork differs from upstream ggml-org builds, so Docker image choice can change behavior materially.
- –Official Strix Halo guidance frames memory as GPUVM/GTT-mapped system RAM, making UMA and KV-cache placement a first-order performance knob.
- –Community reports on Strix Halo suggest ROCm can lead prompt-processing tests, while Vulkan may feel smoother once generation and very long contexts are included.
- –For document-centric agents, batch ingestion, reuse KV cache, and benchmark at real context sizes rather than small prompt benchmarks.
// TAGS
amd-ryzen-ai-max-plus-395llmagentinferenceself-hostedmcpgpu
DISCOVERED
19d ago
2026-03-24
PUBLISHED
19d ago
2026-03-23
RELEVANCE
7/ 10
AUTHOR
Intelligent-Form6624