YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6 35B-A3B runs on Radeon 780M iGPU

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6 35B-A3B runs on Radeon 780M iGPU
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

Qwen3.6 35B-A3B runs on Radeon 780M iGPU

A Reddit user benchmarked Qwen3.6-35B-A3B GGUF running in llama.cpp on a ThinkPad T14 Gen 5 with a Radeon 780M iGPU and 64GB RAM, reporting strong Vulkan performance at Q8_0 with roughly 282 tok/s prompt processing and about 20.7 tok/s generation on a 1024-token test. They also note that Q6_K needed kernel parameter tweaks for larger GTT and longer hang timeout, but then worked well even at full context, which suggests the model is practical on high-memory consumer iGPUs rather than being confined to discrete-GPU rigs.

// ANALYSIS

This is less a model launch story than a "local inference viability" story, and that's the interesting part.

  • The headline result is strong enough to matter: a 35B MoE model is running usefully on an integrated 780M GPU with Vulkan acceleration.
  • The numbers point to a very workable local setup, especially for prompt-heavy workloads where prefill speed matters.
  • The Q6_K note is important: this is not plug-and-play for every kernel/config, but the fact that it becomes stable with tuning makes it credible for enthusiasts.
  • The post is strongest as a benchmark/result item because it reports concrete hardware, backend, quantization, and throughput.
// TAGS
qwenqwen3.6llama.cppvulkangguflocal-llmamdgpuradeon-780mbenchmarkmoe

DISCOVERED

45d ago

2026-04-24

PUBLISHED

45d ago

2026-04-24

RELEVANCE

9/ 10

AUTHOR

itroot