BACK_TO_FEEDAICRIER_2
Qwen3.5-35B-A3B Lags on Intel Arc B60
OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoBENCHMARK RESULT

Qwen3.5-35B-A3B Lags on Intel Arc B60

A LocalLLaMA user asked whether Qwen3.5-35B-A3B in Q4 can hit strong llama.cpp inference speeds on an Intel Arc B60, using an RX 7900 XTX result of about 80 tokens per second as the comparison point. The only reply in the thread points to a much slower 8 t/s result on the B60 from a linked forum post, which makes the Intel card look less compelling for this specific workload unless the software stack is better tuned.

// ANALYSIS

Hot take: this looks like a backend-and-driver story more than a raw hardware story, and the early signal is that Arc B60 is not an obvious upgrade for this model.

  • The post is asking for real-world inference data, not announcing a new model or feature.
  • The OP’s baseline is strong: about 80 tps on an RX 7900 XTX with llama.cpp.
  • The only cited Arc B60 datapoint in the thread is roughly 8 t/s, which is an order of magnitude lower.
  • Qwen3.5-35B-A3B is a MoE model, so performance will vary a lot with runtime support, quantization, and expert-routing efficiency.
  • Official Qwen docs emphasize recent inference stacks like vLLM and SGLang; this discussion is specifically about llama.cpp, so results may not transfer cleanly.
  • Inference: the B60’s 24 GB VRAM alone is not enough to predict good throughput here; software maturity may matter more.
// TAGS
qwenqwen3.5local-llminferencebenchmarkintel-arcllama.cpp

DISCOVERED

21d ago

2026-03-21

PUBLISHED

21d ago

2026-03-21

RELEVANCE

6/ 10

AUTHOR

LeDynamique