BACK_TO_FEEDAICRIER_2
RTX PRO 6000 Blackwell tops 4080 Super
OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT

RTX PRO 6000 Blackwell tops 4080 Super

A Redditor says a borrowed RTX PRO 6000 rig dramatically outperformed their RTX 4080 Super in LM Studio, with Qwen 3.6 27B jumping from about 6 tokens/sec on a Q2 quant and roughly 60 seconds TTFT to about 67 tokens/sec on a Q8 setup and around 1 second TTFT. The post frames the result as an eye-opener for local inference, suggesting the pro card’s much larger memory and workstation-class bandwidth are a better fit for big models than the consumer GPU.

// ANALYSIS

Hot take: this looks less like a small generational bump and more like the difference between “can run the model” and “can run it well.”

  • The reported gain is huge on both throughput and first-token latency, which usually points to memory capacity/bandwidth and quantization headroom, not just raw compute.
  • A 27B model at Q8 on the RTX PRO card is a much more demanding test than a Q2 quant on the 4080 Super, so part of the gap is workload quality, but the speedup is still striking.
  • This is exactly the kind of workload where workstation GPUs justify their price: large VRAM, higher sustained performance, and fewer compromises on quant choice.
  • The M5 Ultra comparison is the right next question, but this benchmark already suggests that local LLM builders who want premium model quality will keep caring a lot about pro GPU memory tiers.
// TAGS
nvidiartx-pro-6000blackwellgpulocal-firstlm-studioqwenbenchmark

DISCOVERED

1d ago

2026-05-02

PUBLISHED

1d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

LargelyInnocuous