OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE
Intel Core Ultra NPUs run LLMs via OpenVINO
Intel's NPUs are graduating from specialized image tasks to general-purpose LLM inference. New benchmarks show the NPU achieving 6-12 tokens/sec on 8B models, offering a power-efficient alternative to GPUs for local AI.
// ANALYSIS
The shift toward NPU-based LLM inference is the defining feature of the "AI PC" era, prioritizing battery life over raw speed.
- –OpenVINO 2026.0 introduces speculative decoding on NPUs, significantly narrowing the latency gap with iGPUs.
- –While an iGPU can hit 20 tokens/sec, the NPU's 10 tokens/sec consumes 3x-5x less power, ideal for persistent background assistants.
- –Real-world testing confirms secure, GDPR-compliant local processing is viable, such as batch processing 15,000 images in 7 hours on mobile hardware.
- –Expanded model support for Qwen2.5 and MiniCPM indicates a rapidly maturing software ecosystem for local edge inference.
// TAGS
intelnpullmedge-aiopen-sourceopenvinointel-core-ultra-npu
DISCOVERED
4h ago
2026-04-12
PUBLISHED
6h ago
2026-04-12
RELEVANCE
8/ 10
AUTHOR
wossnameX