Llama.cpp triples Q8_0 speed on Intel Arc

// 51d agoPRODUCT UPDATE

Llama.cpp triples Q8_0 speed on Intel Arc

A performance fix for the llama.cpp SYCL backend delivers a 3.1x speedup for Q8_0 quantization on Intel Arc GPUs. By implementing a "reorder" optimization that separates scale factors from weight data, the update enables coalesced memory access and boosts bandwidth utilization from 21% to 66%.

// ANALYSIS

This optimization is a massive win for the Intel Xe2 ecosystem, finally unlocking the bandwidth potential of Battlemage GPUs for high-precision local LLM inference.

–Q8_0 quantization previously suffered from non-coalesced memory access due to its non-power-of-two 34-byte block size.
–Reordering allows for efficient GPU cache usage, pushing performance on Qwen3.5-27B from 4.88 t/s to over 15 t/s.
–The speedup makes Q8_0 quantization faster than the less-precise Q6_K on Intel hardware.
–The fix also addresses a silent bug in SYCL buffer initialization that prevented the optimization from being enabled for Q8_0 tensors.
–Verification included binary-patching Intel's closed-source IPEX-LLM to confirm the hardware's theoretical limits before implementing the open-source fix.

// TAGS

llama-cppllmgpuopen-sourceinferencesyclintel-arc

DISCOVERED

51d ago

2026-04-06

PUBLISHED

51d ago

2026-04-06

RELEVANCE

8/ 10

AUTHOR

Katostrofik

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS1h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.

OPEN SOURCE2h ago

Plannotator 0.19.24 adds Amp support and configurable storage

Plannotator 0.19.24 is a substantial release that expands the tool beyond Claude Code with native Amp support, adds a `PLANNOTATOR_DATA_DIR` override so users can move the default `~/.plannotator` data directory, introduces Auto Mode in the permission selector for newer Claude Code versions, and fixes a Pi approval crash after plan acceptance. The update folds multiple stacked PRs into one release and pushes the project further toward a multi-agent review layer rather than a single-agent hook utility.

NEWS3h ago

Aaronson says AI turns mathematicians into curators

Scott Aaronson says recent AI results in mathematics, including a GPT-5.5 Pro solution to Erdős’s Unit Distance Problem, suggest humans may increasingly focus on choosing questions and interpreting model outputs. He extends the argument to AI-written fiction and the Vatican’s AI encyclical as signs of a broader cultural shift.