BACK_TO_FEEDAICRIER_2
Custom Blackwell server hits 198 tokens/second
OPEN_SOURCE ↗
REDDIT · REDDIT// 2d agoBENCHMARK RESULT

Custom Blackwell server hits 198 tokens/second

Visual-Synthesizer released a benchmark for a dual RTX 6000 Blackwell inference server reaching 198 tokens/second on Qwen3.5-122B. The build uses a PCIe switch and speculative decoding to rival enterprise H100 performance at a lower cost.

// ANALYSIS

This project demonstrates that architectural optimization and low-latency interconnects push prosumer hardware into enterprise territory. The PIX topology via a PCIe switch proves that latency reduction is more critical than bandwidth for MoE tensor parallel decoding, while NEXTN speculative decoding provides a massive performance uplift.

// TAGS
llminferenceqwen-3-5rtx-6000blackwellsglangbenchmarkpcie-switchspeculative-decodingfp4moenvidialocalllamartx6kpro

DISCOVERED

2d ago

2026-04-10

PUBLISHED

2d ago

2026-04-10

RELEVANCE

9/ 10

AUTHOR

Visual_Synthesizer