BACK_TO_FEEDAICRIER_2
Gemma 4 hits 81 tok/sec on M5 Max
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT

Gemma 4 hits 81 tok/sec on M5 Max

Google's Gemma 4 26B (A4B) achieves a blistering 81 tokens per second on Apple's M5 Max silicon, leveraging Mixture-of-Experts (MoE) to deliver near-instant reasoning at 114W peak power.

// ANALYSIS

Google's A4B architecture, activating 4 billion of its 26 billion total parameters, allows the M5 Max's 614 GB/s bandwidth to deliver inference speeds formerly reserved for 7B-class models. This 81 tokens per second performance provides the ultra-low latency required for complex, multi-step agentic tool-calling without frustrating wait times. While the 114W peak power draw is impressively efficient, thermal throttling remains a consideration for extended generation sessions. Apple's unified memory architecture continues to be a major advantage, allowing 26B weights to be loaded without the VRAM bottlenecks typical of consumer Nvidia mobile GPUs.

// TAGS
gemma-4llmapple-siliconm5-maxinferenceopen-weightsmoe

DISCOVERED

9d ago

2026-04-03

PUBLISHED

9d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

Bderken