BACK_TO_FEEDAICRIER_2
Gemma 4 31B MTP Runs on MacBooks
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Gemma 4 31B MTP Runs on MacBooks

On Reddit, a user benchmarked Google’s Gemma 4 31B coding MTP BF16 model on a MacBook M5 with 128GB RAM via Ollama/Open WebUI and reported about 10-12 tokens per second. They said llama.cpp support is still missing and want broader benchmarks before comparing it with Qwen3.6 27B or Qwen3 Coder Next.

// ANALYSIS

The interesting part here is not the absolute number, but the gap between Google’s promised MTP speedups and this first real-world local result.

  • 10-12 tok/s on an M5 MacBook with 128GB is respectable for a 63GB BF16 dense model, but not an obvious breakthrough.
  • The test is being run through Ollama/Open WebUI, so the number is framework-dependent and may not reflect the best possible MTP-aware performance.
  • llama.cpp support is still missing here, which limits how broadly this can be evaluated right now.
  • The real question is comparison against existing local coding defaults on the same hardware, especially Qwen3.6 27B and Qwen3 Coder Next.
  • The post reads as an early signal, not a verdict: promising model, but not yet enough evidence to justify a switch.
// TAGS
gemma4llmbenchmarklocal-firstedge-aiinferenceollamaapple-siliconmacbook

DISCOVERED

3h ago

2026-05-06

PUBLISHED

4h ago

2026-05-05

RELEVANCE

8/ 10

AUTHOR

chimph