OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoBENCHMARK RESULT
LM Studio M5 speeds look uneven
A user with a 32GB M5 MacBook Pro is sanity-checking LM Studio throughput after seeing 8 t/s on Gemma 3 27B 4-bit MLX, 32 t/s on Nemotron 3 Nano 4B GGUF, and 39 t/s on GPT OSS 20B MLX under default context settings. The thread is a call for comparable M5 or MacBook Air/Pro numbers to see whether the slowdown is normal or a tuning issue.
// ANALYSIS
Local AI speed on Apple Silicon is a moving target, and this reads less like a broken machine than a calibration check for LM Studio's newest runtime path.
- –LM Studio officially supports both `llama.cpp` GGUF and Apple `MLX` on Apple Silicon, and Apple calls out LM Studio as one of the apps that should benefit from M5's Neural Accelerators.
- –LM Studio has already shipped M5-specific MLX NAX auto-upgrade fixes, so benchmark comparisons on this chip age fast and older runtime builds can undershoot.
- –Community M5 reports already show Nemotron nano 4bit around ~55 t/s after a runtime switch, which makes the poster's 32 t/s believable but not especially fast.
- –The 27B Gemma run is the outlier: 8 t/s points to a bandwidth-heavy workload, a heavier context, or a model/backend pairing that is not hitting the chip's best path.
// TAGS
lm-studiollminferencebenchmarkgpuself-hosted
DISCOVERED
13d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
nemuro87