BACK_TO_FEEDAICRIER_2
Ryzen AI Max+ 395 hits long-context wall
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE

Ryzen AI Max+ 395 hits long-context wall

A Bosgame M5 128GB user on AMD's Strix Halo platform says Claude Code-style document agents feel faster on Vulkan than ROCm, even though ROCm should win prompt processing on paper. The real pain point is long-context work, where performance drops hard once prompts push past roughly 50K tokens.

// ANALYSIS

Strix Halo is powerful enough for local agents, but this thread shows the real bottleneck is backend behavior under long context, not model size. AMD's own docs now position Ryzen AI Max+ 395 for MCP-heavy workflows, yet the software stack still needs tuning before it feels effortless.

  • AMD's ROCm docs say the supported llama.cpp fork differs from upstream ggml-org builds, so Docker image choice can change behavior materially.
  • Official Strix Halo guidance frames memory as GPUVM/GTT-mapped system RAM, making UMA and KV-cache placement a first-order performance knob.
  • Community reports on Strix Halo suggest ROCm can lead prompt-processing tests, while Vulkan may feel smoother once generation and very long contexts are included.
  • For document-centric agents, batch ingestion, reuse KV cache, and benchmark at real context sizes rather than small prompt benchmarks.
// TAGS
amd-ryzen-ai-max-plus-395llmagentinferenceself-hostedmcpgpu

DISCOVERED

19d ago

2026-03-24

PUBLISHED

19d ago

2026-03-23

RELEVANCE

7/ 10

AUTHOR

Intelligent-Form6624