YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-27B tok/s claims miss old CPUs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-27B tok/s claims miss old CPUs
OPEN LINK ↗
// 46d agoBENCHMARK RESULT

Qwen3.6-27B tok/s claims miss old CPUs

A user says Qwen3.6-27B runs far slower than the tok/s numbers they see online, even with everything loaded into VRAM on a 3090 Ti. They report about 10 tok/s in llama.cpp and 18-19 tok/s in ik_llama.cpp at 50k context, then ask whether the slowdown is really caused by the model’s hybrid architecture and an older i9-9900K, or whether the CPU-bottleneck explanation is overstated.

// ANALYSIS

Hot take: the explanation is directionally plausible, but it is too absolute.

  • The official Qwen3.6-27B model card describes a hybrid `Gated DeltaNet + Gated Attention` layout, so it is not a plain dense transformer with a trivial all-GPU decode path.
  • `ik_llama.cpp` documents a faster `HAVE_FANCY_SIMD` path tied to AVX-VNNI/AVX-512-style support; Intel’s i9-9900K spec lists AVX2, not AVX-512 or VNNI.
  • That makes it believable that an older Coffee Lake CPU can bottleneck hybrid inference, especially in a backend that keeps part of the compute on the host.
  • The big comparison trap is context length: 50k context is far harsher than the short-context runs people often post online.
  • The higher numbers on Reddit are likely from a different mix of variables: shorter prompts, speculative decoding, different quantizations, newer CPUs, or a backend that avoids the same CPU-side work.
  • So this is probably not “gaslighting,” but it is almost certainly an apples-to-oranges benchmark comparison.
// TAGS
qwen3-6-27bqwenllama.cppik_llama.cppbenchmarktok/savx2avx-vnnilong-contextgpu-inference

DISCOVERED

46d ago

2026-04-30

PUBLISHED

46d ago

2026-04-30

RELEVANCE

9/ 10

AUTHOR

YourNightmar31