OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoBENCHMARK RESULT
llama.cpp Thread Seeks M5 Max Results
A LocalLLaMA user is asking for a standard `llama-bench` run on an M5 Max with Llama 2 7B Q4_0, using `-p 512 -n 128 -ngl 99` for full Metal offload. The goal is a clean PP/TG datapoint for the official llama.cpp Apple Silicon performance thread.
// ANALYSIS
This is the kind of boring benchmark ask that actually moves local-LLM hardware decisions. One reproducible M5 Max run tells buyers more than a month of spec-sheet wars.
- –The requested `llama-bench` command is standardized, so any reply will slot cleanly into the llama.cpp Apple Silicon tracking thread.
- –PP/TG matters because prefill and generation stress different parts of the stack.
- –The post reflects how local inference on Macs still depends on community benchmarks rather than marketing claims.
- –If the numbers land, this becomes a practical reference for people choosing a MacBook Pro or Studio for LLM work.
// TAGS
llama-cppm5-maxbenchmarkllminferenceopen-sourcegpu
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
8/ 10
AUTHOR
ForsookComparison