BACK_TO_FEEDAICRIER_2
Qwen3.5-27B Slows Despite Dense Design
OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoINFRASTRUCTURE

Qwen3.5-27B Slows Despite Dense Design

A LocalLLaMA user is seeing only ~30 tok/s from Qwen3.5-27B on OpenRouter, even via the fastest listed provider. For a dense 27B model, that is less a surprise than a serving problem: VRAM fit, quantization, batching, and prompt prefill all matter, and the posted TTFT spikes suggest queueing is hurting more than raw decode speed.

// ANALYSIS

30 tok/s is not the real headline here; the 30-95 second TTFT is.

  • Qwen3.5-27B is dense, so every generated token uses all 27B parameters. MoE models with far fewer active parameters can be dramatically faster at the same nominal size.
  • Qwen’s docs say Qwen3.5 defaults to thinking mode, which can add hidden reasoning work before the visible answer unless the provider disables it.
  • OpenRouter’s provider table is routed serving, not a clean single-GPU benchmark, so batching, queue depth, prompt length, and CPU offload can swing throughput a lot.
  • TTFT means time to first token, so those long waits include prompt processing and queueing as well as model compute.
  • On consumer cards, a 27B model often does not fit comfortably at useful precision, so once layers spill out of VRAM, token speed drops fast.
// TAGS
qwen3-5-27bllminferencegpubenchmarkopen-weights

DISCOVERED

14d ago

2026-03-29

PUBLISHED

14d ago

2026-03-28

RELEVANCE

8/ 10

AUTHOR

Deep_Row_8729