YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5 vLLM Docker config lands on 6000 Pro

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5 vLLM Docker config lands on 6000 Pro
OPEN LINK ↗
// 67d agoINFRASTRUCTURE

Qwen3.5 vLLM Docker config lands on 6000 Pro

Ian Hailey's GitHub repo packages a Docker/vLLM stack for serving Sehyo/Qwen3.5-122B-A10B-NVFP4 on a single RTX PRO 6000 Blackwell with a 262k-token context window. It adds Multi-Token Prediction (MTP) speculative decoding and a GPU-specific tuning script, turning the setup into a reproducible inference recipe rather than a one-off config dump.

// ANALYSIS

This is a useful field report, not a splashy launch: it shows how much work it still takes to get a huge NVFP4 MoE to behave on Blackwell, and why nightly software plus hardware-specific tuning are still part of the game. The payoff is that the repo bundles the messy bits into something other Blackwell owners can actually reproduce.

  • The repo proves that a 122B MoE with a 262k context window can be made practical on a single workstation-class Blackwell GPU, which is a meaningful milestone for local serving.
  • `vllm/vllm-openai:nightly` plus a forced Transformers reinstall suggests support for these quantized Qwen builds is still moving fast and not fully settled in stable releases.
  • The optional `mtp_tune` workflow brute-forces 1,920 fused MoE kernel configurations and writes a device-specific JSON, so this is as much a calibration tool as a Docker example.
  • Benchmark notes show the tradeoff clearly: MTP improves single-request token latency, but two-request concurrency can hammer throughput, so this is tuned for solo interactive use rather than shared traffic.
  • Since the repo is public, it gives other Blackwell owners a practical starting point instead of forcing them to reverse-engineer the stack.
// TAGS
vllm-docker-qwen3-5-122b-a10b-nvfp4llminferencegpuself-hostedbenchmark

DISCOVERED

67d ago

2026-03-22

PUBLISHED

67d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

1-a-n