YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen 3.5 27B hits 2,000 TPS

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen 3.5 27B hits 2,000 TPS
OPEN LINK ↗
// 88d agoBENCHMARK RESULT

Qwen 3.5 27B hits 2,000 TPS

A LocalLLaMA user reports roughly 2,000 tokens/sec prefill throughput for markdown-document classification using an Unsloth Q5_K_XL GGUF build of Qwen 3.5 27B on an RTX 5090 with llama.cpp CUDA 13. The setup is tuned for long inputs, minimal outputs, and batch parallelism, making it strong for high-volume classification but highly workload-specific.

// ANALYSIS

This is a strong real-world throughput datapoint for local inference, but it should be read as a specialized benchmark rather than a general performance baseline.

  • The reported speed is dominated by input-heavy prefill, not long-form generation throughput.
  • Disabling vision/mmproj and using “no thinking” removed extra compute paths for this text-only task.
  • Reducing context to 128k and matching parallelism to batch size (8) helped keep VRAM pressure controlled.
  • The author notes evals are still partial, so accuracy and quality tradeoffs need fuller validation.
// TAGS
qwen3-5-27bllminferencegpubenchmarkllama-cpp

DISCOVERED

88d ago

2026-03-14

PUBLISHED

88d ago

2026-03-13

RELEVANCE

8/ 10

AUTHOR

awitod