REDDIT · REDDIT// 19d agoBENCHMARK RESULT

llama.cpp Thread Seeks M5 Max Results

A LocalLLaMA user is asking for a standard `llama-bench` run on an M5 Max with Llama 2 7B Q4_0, using `-p 512 -n 128 -ngl 99` for full Metal offload. The goal is a clean PP/TG datapoint for the official llama.cpp Apple Silicon performance thread.

// ANALYSIS

This is the kind of boring benchmark ask that actually moves local-LLM hardware decisions. One reproducible M5 Max run tells buyers more than a month of spec-sheet wars.

–The requested `llama-bench` command is standardized, so any reply will slot cleanly into the llama.cpp Apple Silicon tracking thread.
–PP/TG matters because prefill and generation stress different parts of the stack.
–The post reflects how local inference on Macs still depends on community benchmarks rather than marketing claims.
–If the numbers land, this becomes a practical reference for people choosing a MacBook Pro or Studio for LLM work.

// TAGS

llama-cppm5-maxbenchmarkllminferenceopen-sourcegpu

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

ForsookComparison