Custom Blackwell server hits 198 tokens/second
Visual-Synthesizer released a benchmark for a dual RTX 6000 Blackwell inference server reaching 198 tokens/second on Qwen3.5-122B. The build uses a PCIe switch and speculative decoding to rival enterprise H100 performance at a lower cost.
This project demonstrates that architectural optimization and low-latency interconnects push prosumer hardware into enterprise territory. The PIX topology via a PCIe switch proves that latency reduction is more critical than bandwidth for MoE tensor parallel decoding, while NEXTN speculative decoding provides a massive performance uplift.
DISCOVERED
48d ago
2026-04-10
PUBLISHED
48d ago
2026-04-10
RELEVANCE
AUTHOR
Visual_Synthesizer