OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT
RTX PRO 6000 Blackwell tops 4080 Super
A Redditor says a borrowed RTX PRO 6000 rig dramatically outperformed their RTX 4080 Super in LM Studio, with Qwen 3.6 27B jumping from about 6 tokens/sec on a Q2 quant and roughly 60 seconds TTFT to about 67 tokens/sec on a Q8 setup and around 1 second TTFT. The post frames the result as an eye-opener for local inference, suggesting the pro card’s much larger memory and workstation-class bandwidth are a better fit for big models than the consumer GPU.
// ANALYSIS
Hot take: this looks less like a small generational bump and more like the difference between “can run the model” and “can run it well.”
- –The reported gain is huge on both throughput and first-token latency, which usually points to memory capacity/bandwidth and quantization headroom, not just raw compute.
- –A 27B model at Q8 on the RTX PRO card is a much more demanding test than a Q2 quant on the 4080 Super, so part of the gap is workload quality, but the speedup is still striking.
- –This is exactly the kind of workload where workstation GPUs justify their price: large VRAM, higher sustained performance, and fewer compromises on quant choice.
- –The M5 Ultra comparison is the right next question, but this benchmark already suggests that local LLM builders who want premium model quality will keep caring a lot about pro GPU memory tiers.
// TAGS
nvidiartx-pro-6000blackwellgpulocal-firstlm-studioqwenbenchmark
DISCOVERED
1d ago
2026-05-02
PUBLISHED
1d ago
2026-05-01
RELEVANCE
8/ 10
AUTHOR
LargelyInnocuous