OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT
M5 Pro 48GB doubles VRAM, trails bandwidth
A local LLM enthusiast benchmarks the 48GB M5 Pro against the NVIDIA RTX A5000, questioning if unified memory can compete with discrete GPU speeds. While Apple's 307 GB/s bandwidth is roughly 40% of the A5000's 768 GB/s, the 48GB capacity enables local inference for 50B-70B models that 24GB VRAM cards cannot handle without severe performance penalties.
// ANALYSIS
The M5 Pro is a capacity king but a bandwidth underdog, making it a "slow and steady" alternative to high-end NVIDIA GPUs for large models.
- –48GB unified memory allows running 50B-70B models at high precision, whereas the 24GB RTX A5000 requires heavy quantization or slow CPU offloading.
- –For models under 30B, the A5000's 768 GB/s memory bandwidth will significantly outperform the M5 Pro's 307 GB/s.
- –Native MLX support on Apple Silicon is required to bridge the performance gap with CUDA, offering a 20-30% boost over standard llama.cpp.
- –Expect roughly 30-40 TPS for 35B models on the M5 Pro; the user's 100 TPS on A5000 likely stems from high-speed MoE architectures or aggressive quantization.
- –M5 Pro remains the superior choice for large context windows (128k+) and multi-model workflows that exceed the strict VRAM limits of single-GPU setups.
// TAGS
llminferencegpuinfrastructureapple-m5-prortx-a5000mlxllama-cpp
DISCOVERED
1d ago
2026-04-13
PUBLISHED
1d ago
2026-04-13
RELEVANCE
8/ 10
AUTHOR
Overall-Somewhere760