REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Qwen3.6 35B-A3B runs on Radeon 780M iGPU

A Reddit user benchmarked Qwen3.6-35B-A3B GGUF running in llama.cpp on a ThinkPad T14 Gen 5 with a Radeon 780M iGPU and 64GB RAM, reporting strong Vulkan performance at Q8_0 with roughly 282 tok/s prompt processing and about 20.7 tok/s generation on a 1024-token test. They also note that Q6_K needed kernel parameter tweaks for larger GTT and longer hang timeout, but then worked well even at full context, which suggests the model is practical on high-memory consumer iGPUs rather than being confined to discrete-GPU rigs.

// ANALYSIS

This is less a model launch story than a "local inference viability" story, and that's the interesting part.

–The headline result is strong enough to matter: a 35B MoE model is running usefully on an integrated 780M GPU with Vulkan acceleration.
–The numbers point to a very workable local setup, especially for prompt-heavy workloads where prefill speed matters.
–The Q6_K note is important: this is not plug-and-play for every kernel/config, but the fact that it becomes stable with tuning makes it credible for enthusiasts.
–The post is strongest as a benchmark/result item because it reports concrete hardware, backend, quantization, and throughput.

// TAGS

qwenqwen3.6llama.cppvulkangguflocal-llmamdgpuradeon-780mbenchmarkmoe

DISCOVERED

4h ago

2026-04-24

PUBLISHED

7h ago

2026-04-24

RELEVANCE

9/ 10

AUTHOR

itroot