Intel Core Ultra NPUs run LLMs via OpenVINO

// 45d agoINFRASTRUCTURE

Intel Core Ultra NPUs run LLMs via OpenVINO

Intel's NPUs are graduating from specialized image tasks to general-purpose LLM inference. New benchmarks show the NPU achieving 6-12 tokens/sec on 8B models, offering a power-efficient alternative to GPUs for local AI.

// ANALYSIS

The shift toward NPU-based LLM inference is the defining feature of the "AI PC" era, prioritizing battery life over raw speed.

–OpenVINO 2026.0 introduces speculative decoding on NPUs, significantly narrowing the latency gap with iGPUs.
–While an iGPU can hit 20 tokens/sec, the NPU's 10 tokens/sec consumes 3x-5x less power, ideal for persistent background assistants.
–Real-world testing confirms secure, GDPR-compliant local processing is viable, such as batch processing 15,000 images in 7 hours on mobile hardware.
–Expanded model support for Qwen2.5 and MiniCPM indicates a rapidly maturing software ecosystem for local edge inference.

// TAGS

intelnpullmedge-aiopen-sourceopenvinointel-core-ultra-npu

DISCOVERED

45d ago

2026-04-12

PUBLISHED

45d ago

2026-04-12

RELEVANCE

8/ 10

AUTHOR

wossnameX

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE21m ago

Grok Build widens access, adds subagents

xAI’s Grok Build is an early-beta terminal coding agent with plan-review-approve flows, parallel subagents, worktree isolation, and support for plugins, hooks, skills, and MCP. The latest improvements make it feel less like a demo and more like xAI’s bid to compete seriously in the AI coding CLI race.

MODEL28m ago

Krea 2 lands on Replicate

Krea 2 is now available on Replicate, giving developers access to Krea's style-first image model outside the Krea app. It emphasizes aesthetic diversity, style control, and reference-driven creative workflows.

MODEL1h ago

ElevenLabs launches Music v2 for creators

ElevenLabs has released Music v2, a new music generation model that improves vocals, instrumentation, arrangement, and multilingual output. The model supports longer, section-by-section composition, inpainting to regenerate specific parts of a track, and more complex shifts within a song without losing coherence. It powers ElevenMusic and ElevenCreative now, with ElevenAPI access coming soon, and is trained on licensed data for commercial use.