YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5-27B Slows Despite Dense Design

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5-27B Slows Despite Dense Design
OPEN LINK ↗
// 60d agoINFRASTRUCTURE

Qwen3.5-27B Slows Despite Dense Design

A LocalLLaMA user is seeing only ~30 tok/s from Qwen3.5-27B on OpenRouter, even via the fastest listed provider. For a dense 27B model, that is less a surprise than a serving problem: VRAM fit, quantization, batching, and prompt prefill all matter, and the posted TTFT spikes suggest queueing is hurting more than raw decode speed.

// ANALYSIS

30 tok/s is not the real headline here; the 30-95 second TTFT is.

  • Qwen3.5-27B is dense, so every generated token uses all 27B parameters. MoE models with far fewer active parameters can be dramatically faster at the same nominal size.
  • Qwen’s docs say Qwen3.5 defaults to thinking mode, which can add hidden reasoning work before the visible answer unless the provider disables it.
  • OpenRouter’s provider table is routed serving, not a clean single-GPU benchmark, so batching, queue depth, prompt length, and CPU offload can swing throughput a lot.
  • TTFT means time to first token, so those long waits include prompt processing and queueing as well as model compute.
  • On consumer cards, a 27B model often does not fit comfortably at useful precision, so once layers spill out of VRAM, token speed drops fast.
// TAGS
qwen3-5-27bllminferencegpubenchmarkopen-weights

DISCOVERED

60d ago

2026-03-29

PUBLISHED

61d ago

2026-03-28

RELEVANCE

8/ 10

AUTHOR

Deep_Row_8729