BACK_TO_FEEDAICRIER_2
Gemma 4 MoE hits 12 t/s on Lunar Lake
OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoBENCHMARK RESULT

Gemma 4 MoE hits 12 t/s on Lunar Lake

Developers are successfully running Google's new Gemma 4 26B MoE models on Intel Lunar Lake integrated graphics via Vulkan. The hardware's on-package memory architecture delivers highly usable inference speeds without requiring a discrete GPU.

// ANALYSIS

Intel's Lunar Lake architecture is quietly becoming a powerhouse for local LLM inference, proving that high memory bandwidth can offset the lack of a discrete GPU.

  • The Gemma 4 26B Mixture of Experts (MoE) model hits a hardware sweet spot by only activating ~4B parameters per token
  • Lunar Lake's 32GB of on-package LPDDR5X memory eliminates the traditional CPU-to-GPU bus latency, providing the crucial bandwidth needed for large models
  • While native OpenVINO optimization currently struggles with 20B+ models on the NPU, community-compiled Vulkan bridges are effectively leveraging the Xe2 iGPU
  • Achieving 7-12 tokens per second for a 26B model on a thin-and-light laptop significantly lowers the hardware barrier for local AI development
// TAGS
gemma-4llminferenceedge-aiopen-weights

DISCOVERED

7h ago

2026-04-12

PUBLISHED

10h ago

2026-04-12

RELEVANCE

8/ 10

AUTHOR

No-Key8555