RTX PRO 6000 Blackwell tops 4080 Super

// 50d agoBENCHMARK RESULT

RTX PRO 6000 Blackwell tops 4080 Super

A Redditor says a borrowed RTX PRO 6000 rig dramatically outperformed their RTX 4080 Super in LM Studio, with Qwen 3.6 27B jumping from about 6 tokens/sec on a Q2 quant and roughly 60 seconds TTFT to about 67 tokens/sec on a Q8 setup and around 1 second TTFT. The post frames the result as an eye-opener for local inference, suggesting the pro card’s much larger memory and workstation-class bandwidth are a better fit for big models than the consumer GPU.

// ANALYSIS

Hot take: this looks less like a small generational bump and more like the difference between “can run the model” and “can run it well.”

–The reported gain is huge on both throughput and first-token latency, which usually points to memory capacity/bandwidth and quantization headroom, not just raw compute.
–A 27B model at Q8 on the RTX PRO card is a much more demanding test than a Q2 quant on the 4080 Super, so part of the gap is workload quality, but the speedup is still striking.
–This is exactly the kind of workload where workstation GPUs justify their price: large VRAM, higher sustained performance, and fewer compromises on quant choice.
–The M5 Ultra comparison is the right next question, but this benchmark already suggests that local LLM builders who want premium model quality will keep caring a lot about pro GPU memory tiers.

// TAGS

nvidiartx-pro-6000blackwellgpulocal-firstlm-studioqwenbenchmark

DISCOVERED

50d ago

2026-05-02

PUBLISHED

50d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

LargelyInnocuous

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS55m ago

Givros asks if GPT-5.6 hits OpenAI Codex

AI creator Givros publicly asked OpenAI's Head of Codex Thibault Sottiaux whether the rumored GPT-5.6 model will be integrated into the Codex coding agent platform immediately upon its release. The question underscores the intense community interest in how quickly OpenAI will roll out new model capabilities to its developer tools amidst rumors of GPT-5.6's testing and impending launch.

NEWS2h ago

Google, Meta models land on Huawei Ascend

The Chinese AI ecosystem is focusing on porting Western open-source models, such as Google's T5-Efficient-Tiny and Meta's V-JEPA 2, to Huawei's Ascend NPU. This trend highlights a shift toward building out software support and compatibility for domestic silicon during a quiet cycle for novel local releases.

NEWS4h ago

OpenAI Codex teases major front-end updates

An upcoming update for OpenAI Codex is being teased on social media as a potentially game-changing solution for front-end development. The teaser hints that the new release will address long-standing challenges in automating front-end coding, generating excitement within the developer community about the next generation of AI-assisted software engineering tools.