BACK_TO_FEEDAICRIER_2
Qwen3 Coder hits 1,207 tok/s
OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoBENCHMARK RESULT

Qwen3 Coder hits 1,207 tok/s

CloudRift published a reproducible benchmark and deployment guide showing Qwen3 Coder reaching 1,157 tok/s on an RTX 5090 and 1,207 tok/s peak throughput on an RTX PRO 6000, with vLLM ultimately outperforming SGLang under sustained load. The post also opens DeploDock’s GitHub-based benchmarking infrastructure to community PRs through March, making these sweeps easier to rerun and compare.

// ANALYSIS

The bigger story is not just that Qwen3 Coder runs fast on Blackwell GPUs, but that someone finally packaged the tuning process into shareable recipes instead of one-off forum screenshots.

  • vLLM crushed SGLang on the 5090 AWQ setup, delivering 2.7x higher output throughput and much better TTFT for Qwen3-Coder-30B-A3B-Instruct-AWQ
  • The 32GB RTX 5090 held roughly 115K context without throughput collapse, while the 96GB PRO 6000 handled the full 262K context window with no measurable degradation
  • On PRO 6000, SGLang won the low-concurrency comparison, but vLLM pulled ahead hard at scale and peaked at 1,207 tok/s once concurrency was pushed to 40
  • DeploDock looks like the real infrastructure play here: recipes, benchmark matrices, and GitHub Actions results turn LLM serving optimization into something teams can version, review, and replicate
// TAGS
qwen3-coderllminferencebenchmarkopen-sourcecloud

DISCOVERED

37d ago

2026-03-06

PUBLISHED

37d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

NoVibeCoding