Mistral Medium 3.5 benchmarks on Strix Halo
A LocalLLaMA user ran Mistral Medium 3.5’s 128B quantized model on an AMD Ryzen AI Max+ 395 Strix Halo mini PC with 128GB unified memory. The benchmark shows the model fits locally with room for context, but sustained generation still lands at just 1.57 tokens/sec.
This is a useful proof point for local inference: Strix Halo-class APUs can now host frontier-scale open weights, but the experience is still constrained by throughput, heat, and power. Prompt processing at 32.1 tok/s is strong enough to make prefill and interactive setup feel usable, even if generation is slow. The post reports 79Gi unified memory used and roughly 40Gi left for context, which is exactly why 128GB systems matter for 128B models. Peak power around 145-154W and fan noise up to 46 dBA make this feel like a compact workstation, not a quiet desktop. The open-weights release plus local quantization makes Mistral Medium 3.5 a real option for teams that want control over data and deployment. The headline here is feasibility, not speed: this setup is interesting for private workflows, offline analysis, and long-context experiments, not fast interactive coding at scale.
DISCOVERED
3h ago
2026-05-04
PUBLISHED
7h ago
2026-05-04
RELEVANCE
AUTHOR
westsunset