Qwen3.5 vLLM Docker config lands on 6000 Pro

// 112d agoINFRASTRUCTURE

Qwen3.5 vLLM Docker config lands on 6000 Pro

Ian Hailey's GitHub repo packages a Docker/vLLM stack for serving Sehyo/Qwen3.5-122B-A10B-NVFP4 on a single RTX PRO 6000 Blackwell with a 262k-token context window. It adds Multi-Token Prediction (MTP) speculative decoding and a GPU-specific tuning script, turning the setup into a reproducible inference recipe rather than a one-off config dump.

// ANALYSIS

This is a useful field report, not a splashy launch: it shows how much work it still takes to get a huge NVFP4 MoE to behave on Blackwell, and why nightly software plus hardware-specific tuning are still part of the game. The payoff is that the repo bundles the messy bits into something other Blackwell owners can actually reproduce.

–The repo proves that a 122B MoE with a 262k context window can be made practical on a single workstation-class Blackwell GPU, which is a meaningful milestone for local serving.
–`vllm/vllm-openai:nightly` plus a forced Transformers reinstall suggests support for these quantized Qwen builds is still moving fast and not fully settled in stable releases.
–The optional `mtp_tune` workflow brute-forces 1,920 fused MoE kernel configurations and writes a device-specific JSON, so this is as much a calibration tool as a Docker example.
–Benchmark notes show the tradeoff clearly: MTP improves single-request token latency, but two-request concurrency can hammer throughput, so this is tuned for solo interactive use rather than shared traffic.
–Since the repo is public, it gives other Blackwell owners a practical starting point instead of forcing them to reverse-engineer the stack.

// TAGS

vllm-docker-qwen3-5-122b-a10b-nvfp4llminferencegpuself-hostedbenchmark

DISCOVERED

112d ago

2026-03-22

PUBLISHED

112d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

1-a-n

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE4m ago

Win11Debloat declutters Windows 10 and 11

Win11Debloat is a lightweight, customizable PowerShell script to declutter, optimize, and customize Windows 10 and 11. It allows users to remove pre-installed bloatware apps, disable telemetry, adjust privacy settings, and tweak user interface elements through an interactive menu or command-line arguments.

RESEARCH30m ago

Smart Cellular Bricks achieve decentralized self-repair

A new Nature Communications paper by researchers from the IT University of Copenhagen, Sakana AI, and Autodesk introduces Smart Cellular Bricks, a modular 3D system capable of shape classification and self-repair. Running a decentralized Neural Cellular Automata model, the individual bricks communicate only with immediate neighbors to collectively coordinate recovery without a central controller.

UPDATE1h ago

OpenDesign integrates Meta Muse Spark API

OpenDesign is an open-source, local-first design workspace that can be paired with Meta's Muse Spark to generate code-ready prototypes and UI screens directly from screenshots and prompts. This integration bridges the gap between visual design and software development, providing developers with an interactive workspace to rapidly iterate on AI-generated user interfaces.