Intel Arc Pro B70 hits 282 t/s prompt eval
A Reddit user reports high-performance local LLM results using the 32GB Intel Arc Pro B70 (Battlemage) on a legacy HP Z640 workstation. Achieving 282 tokens per second on prompt evaluation for a 35B parameter model, the SYCL-powered setup demonstrates the viability of modern Intel silicon for high-VRAM AI workloads on aging hardware.
The report confirms that llama.cpp’s SYCL backend is now mature enough for production-grade speeds, significantly outperforming Vulkan on Battlemage hardware. Successful deployment on a PCIe 3.0 system proves the architecture's resilience to older bandwidth standards, extending the life of legacy workstations. Furthermore, performance spikes in prompt evaluation suggest that Intel's driver-level optimizations for Flash Attention are delivering competitive throughput. At $949, the card enables running large models like Qwen 3.6 35B with massive 130k context windows entirely in VRAM, effectively undercutting the "Nvidia tax" for local inference.
DISCOVERED
7h ago
2026-04-19
PUBLISHED
9h ago
2026-04-19
RELEVANCE
AUTHOR
Serious_Rub_3674