Intel Arc Pro B70 hits 282 t/s prompt eval

// 104d agoBENCHMARK RESULT

Intel Arc Pro B70 hits 282 t/s prompt eval

A Reddit user reports high-performance local LLM results using the 32GB Intel Arc Pro B70 (Battlemage) on a legacy HP Z640 workstation. Achieving 282 tokens per second on prompt evaluation for a 35B parameter model, the SYCL-powered setup demonstrates the viability of modern Intel silicon for high-VRAM AI workloads on aging hardware.

// ANALYSIS

The report confirms that llama.cpp’s SYCL backend is now mature enough for production-grade speeds, significantly outperforming Vulkan on Battlemage hardware. Successful deployment on a PCIe 3.0 system proves the architecture's resilience to older bandwidth standards, extending the life of legacy workstations. Furthermore, performance spikes in prompt evaluation suggest that Intel's driver-level optimizations for Flash Attention are delivering competitive throughput. At $949, the card enables running large models like Qwen 3.6 35B with massive 130k context windows entirely in VRAM, effectively undercutting the "Nvidia tax" for local inference.

// TAGS

llmgpuedge-aiopen-sourceintel-arc-pro-b70llama-cppinferencebenchmark

DISCOVERED

104d ago

2026-04-19

PUBLISHED

104d ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

Serious_Rub_3674

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Synara v0.6.4 adds visible browser control

Synara released version 0.6.4 of its local-first command center for AI-assisted development, granting AI agents native control over a visible browser to navigate, click, type, inspect, upload files, and manage dialogs. The update also enables users to annotate web elements to pass precise DOM context to agents, while introducing customizable runtime permission modes including Approval required, Auto, and Full access.

MODEL2h ago

DeepSeek-V4-Flash-High excels at low-cost frontend coding

AI researcher Elvis Saravia (@omarsar0) highlighted the impressive front-end development capabilities of DeepSeek-V4-Flash-High during recent testing. He noted that the model's output quality was high enough to prompt a double-check of which model was actively being used, praising its performance-to-price ratio.

TUTORIAL2h ago

DAIR.AI offers harness engineering, evals training

DAIR.AI emphasizes harness engineering and model evaluations as essential skills for building production-grade AI applications. The platform is releasing educational resources and courses focused on evaluation harnesses and systematic testing.