OPEN_SOURCE ↗
REDDIT · REDDIT// 2d agoBENCHMARK RESULT
Custom Blackwell server hits 198 tokens/second
Visual-Synthesizer released a benchmark for a dual RTX 6000 Blackwell inference server reaching 198 tokens/second on Qwen3.5-122B. The build uses a PCIe switch and speculative decoding to rival enterprise H100 performance at a lower cost.
// ANALYSIS
This project demonstrates that architectural optimization and low-latency interconnects push prosumer hardware into enterprise territory. The PIX topology via a PCIe switch proves that latency reduction is more critical than bandwidth for MoE tensor parallel decoding, while NEXTN speculative decoding provides a massive performance uplift.
// TAGS
llminferenceqwen-3-5rtx-6000blackwellsglangbenchmarkpcie-switchspeculative-decodingfp4moenvidialocalllamartx6kpro
DISCOVERED
2d ago
2026-04-10
PUBLISHED
2d ago
2026-04-10
RELEVANCE
9/ 10
AUTHOR
Visual_Synthesizer