Gemma 4 hits 81 tok/sec on M5 Max
Google's Gemma 4 26B (A4B) achieves a blistering 81 tokens per second on Apple's M5 Max silicon, leveraging Mixture-of-Experts (MoE) to deliver near-instant reasoning at 114W peak power.
Google's A4B architecture, activating 4 billion of its 26 billion total parameters, allows the M5 Max's 614 GB/s bandwidth to deliver inference speeds formerly reserved for 7B-class models. This 81 tokens per second performance provides the ultra-low latency required for complex, multi-step agentic tool-calling without frustrating wait times. While the 114W peak power draw is impressively efficient, thermal throttling remains a consideration for extended generation sessions. Apple's unified memory architecture continues to be a major advantage, allowing 26B weights to be loaded without the VRAM bottlenecks typical of consumer Nvidia mobile GPUs.
DISCOVERED
9d ago
2026-04-03
PUBLISHED
9d ago
2026-04-03
RELEVANCE
AUTHOR
Bderken