Intel Arc Pro B70 impresses, frustrates
The Arc Pro B70 can push impressive local-LLM throughput, with the author hitting 235 t/s on Gemma 3 27B in vLLM across 100 requests. But the surrounding software stack is still rough: MoE support is partial, new-architecture quantization is fragile, and containerized setup is full of sharp edges.
This reads like strong hardware trapped inside immature tooling. The throughput result is the headline win: 32GB of VRAM and high bandwidth make the B70 credible for serious local inference. The pain points are ecosystem-level, not just user error: MoE support, quantization pipelines, and container driver integration are still brittle. The move to vLLM is sensible for request density, but the software maturity gap makes "best on paper" different from "pleasant to run."
DISCOVERED
3d ago
2026-04-09
PUBLISHED
3d ago
2026-04-09
RELEVANCE
AUTHOR
Icy_Gur6890