OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
Gemma 4 31B MTP Runs on MacBooks
On Reddit, a user benchmarked Google’s Gemma 4 31B coding MTP BF16 model on a MacBook M5 with 128GB RAM via Ollama/Open WebUI and reported about 10-12 tokens per second. They said llama.cpp support is still missing and want broader benchmarks before comparing it with Qwen3.6 27B or Qwen3 Coder Next.
// ANALYSIS
The interesting part here is not the absolute number, but the gap between Google’s promised MTP speedups and this first real-world local result.
- –10-12 tok/s on an M5 MacBook with 128GB is respectable for a 63GB BF16 dense model, but not an obvious breakthrough.
- –The test is being run through Ollama/Open WebUI, so the number is framework-dependent and may not reflect the best possible MTP-aware performance.
- –llama.cpp support is still missing here, which limits how broadly this can be evaluated right now.
- –The real question is comparison against existing local coding defaults on the same hardware, especially Qwen3.6 27B and Qwen3 Coder Next.
- –The post reads as an early signal, not a verdict: promising model, but not yet enough evidence to justify a switch.
// TAGS
gemma4llmbenchmarklocal-firstedge-aiinferenceollamaapple-siliconmacbook
DISCOVERED
3h ago
2026-05-06
PUBLISHED
4h ago
2026-05-05
RELEVANCE
8/ 10
AUTHOR
chimph