Qwen3.5-35B-A3B Lags on Intel Arc B60

// 67d agoBENCHMARK RESULT

Qwen3.5-35B-A3B Lags on Intel Arc B60

A LocalLLaMA user asked whether Qwen3.5-35B-A3B in Q4 can hit strong llama.cpp inference speeds on an Intel Arc B60, using an RX 7900 XTX result of about 80 tokens per second as the comparison point. The only reply in the thread points to a much slower 8 t/s result on the B60 from a linked forum post, which makes the Intel card look less compelling for this specific workload unless the software stack is better tuned.

// ANALYSIS

Hot take: this looks like a backend-and-driver story more than a raw hardware story, and the early signal is that Arc B60 is not an obvious upgrade for this model.

–The post is asking for real-world inference data, not announcing a new model or feature.
–The OP’s baseline is strong: about 80 tps on an RX 7900 XTX with llama.cpp.
–The only cited Arc B60 datapoint in the thread is roughly 8 t/s, which is an order of magnitude lower.
–Qwen3.5-35B-A3B is a MoE model, so performance will vary a lot with runtime support, quantization, and expert-routing efficiency.
–Official Qwen docs emphasize recent inference stacks like vLLM and SGLang; this discussion is specifically about llama.cpp, so results may not transfer cleanly.
–Inference: the B60’s 24 GB VRAM alone is not enough to predict good throughput here; software maturity may matter more.

// TAGS

qwenqwen3.5local-llminferencebenchmarkintel-arcllama.cpp

DISCOVERED

67d ago

2026-03-21

PUBLISHED

67d ago

2026-03-21

RELEVANCE

6/ 10

AUTHOR

LeDynamique

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE2h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

UPDATE2h ago

YouTube moves AI labels to video player

YouTube is moving its AI content disclosures from video descriptions to more prominent placements beneath the player and on Shorts overlays. Starting in May, the platform will use internal signals to automatically label photorealistic AI content that creators fail to disclose.

OPEN SOURCE6h ago

Taste Skill kills AI "frontend slop"

Taste-Skill is an open-source framework that provides portable "agent skills" to enforce high-end design principles in AI-generated code. By injecting specific design directives and "anti-slop" rules, it enables LLMs to produce editorial-grade UIs that bypass generic, boilerplate-heavy AI templates.