YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B Hits 33 T/s on 6 GB VRAM

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B Hits 33 T/s on 6 GB VRAM
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

Qwen3.6-35B-A3B Hits 33 T/s on 6 GB VRAM

This post benchmarks Qwen3.6-35B-A3B on an ASUS Zephyrus G14 (2020) with an RTX 2060 Max-Q 6 GB, Ryzen 4900HS, and 24 GB RAM. The author reports moving from about 12 tok/s to 22-33 tok/s by changing quantization, CPU offload behavior, and speculative decoding settings.

// ANALYSIS

Hot take: this reads less like a model benchmark and more like a reminder that local LLM performance is often an engineering problem disguised as a hardware problem.

  • The strongest practical finding is that clean IQ4_NL and APEX I-Compact beat “faster-looking” dynamic quants once CPU offload enters the picture.
  • Ngram speculative decoding appears unusually effective for coding-agent workloads, with the author reporting very high draft acceptance and a peak around 33 tok/s.
  • The automated overnight search didn’t uncover anything better than the manually tuned config, which suggests the last-mile gains were mostly from targeted human debugging.
  • The `--poll 0` result looks suspicious: it may be a measurement artifact or a workload-specific anomaly rather than a general optimization.
  • The battery result is the most surprising part: sustained 10 tok/s on a five-year-old laptop is genuinely useful for portable local inference.
// TAGS
qwenqwen3local-firstllama.cppquantizationcpu-offloadspeculative-decodingmoebenchmarklaptop-inference

DISCOVERED

45d ago

2026-05-03

PUBLISHED

45d ago

2026-05-03

RELEVANCE

8/ 10

AUTHOR

abhinand05