OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoINFRASTRUCTURE
Dual Quadro RTX 6000s with NVLink look like a capacity-first upgrade for Gemma 31B, not a pure speed win.
This Reddit post asks whether two NVIDIA Quadro RTX 6000 GPUs with an NVLink bridge are worth $1300 for local LLM work, specifically to run Gemma 31B faster than a 4x3060 setup. The underlying hardware is NVIDIA’s Turing-era pro card with 24GB per GPU, and NVLink can expose a combined 48GB memory pool and up to 100 GB/s interconnect bandwidth. The question is really about whether older but high-VRAM workstation cards are a better buy than newer consumer GPUs for local inference.
// ANALYSIS
Hot take: this is only a good deal if the user’s bottleneck is VRAM capacity and their software stack can actually use dual-GPU NVLink well; it is not an automatic throughput upgrade.
- –Each Quadro RTX 6000 has 24GB GDDR6, so the pair solves the “fit the model” problem better than 12GB cards.
- –NVIDIA’s docs position NVLink as a way to combine memory to 48GB and improve multi-GPU transfer, but that does not guarantee linear speedups for inference.
- –For Gemma 31B, the bigger win may be avoiding heavy CPU offload or awkward splitting, which can matter more than raw TFLOPS.
- –The setup is older Turing hardware, so efficiency per watt and per dollar may be worse than newer 48GB alternatives if those are available in the market.
- –If the user’s current 4x3060 rig is slow because of bandwidth, context length, or model parallel overhead, NVLink may help some workflows, but it is still a niche, stack-dependent purchase.
// TAGS
nvidiaquadrortx6000nvlinkgemmallmlocal-aigpuinferenceworkstation
DISCOVERED
3d ago
2026-04-09
PUBLISHED
3d ago
2026-04-09
RELEVANCE
9/ 10
AUTHOR
Buildthehomelab