OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoBENCHMARK RESULT
Gemma 4 MoE hits 12 t/s on Lunar Lake
Developers are successfully running Google's new Gemma 4 26B MoE models on Intel Lunar Lake integrated graphics via Vulkan. The hardware's on-package memory architecture delivers highly usable inference speeds without requiring a discrete GPU.
// ANALYSIS
Intel's Lunar Lake architecture is quietly becoming a powerhouse for local LLM inference, proving that high memory bandwidth can offset the lack of a discrete GPU.
- –The Gemma 4 26B Mixture of Experts (MoE) model hits a hardware sweet spot by only activating ~4B parameters per token
- –Lunar Lake's 32GB of on-package LPDDR5X memory eliminates the traditional CPU-to-GPU bus latency, providing the crucial bandwidth needed for large models
- –While native OpenVINO optimization currently struggles with 20B+ models on the NPU, community-compiled Vulkan bridges are effectively leveraging the Xe2 iGPU
- –Achieving 7-12 tokens per second for a 26B model on a thin-and-light laptop significantly lowers the hardware barrier for local AI development
// TAGS
gemma-4llminferenceedge-aiopen-weights
DISCOVERED
7h ago
2026-04-12
PUBLISHED
10h ago
2026-04-12
RELEVANCE
8/ 10
AUTHOR
No-Key8555