OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoINFRASTRUCTURE
RTX PRO 6000, A100 clash over dense inference
A LocalLLaMA thread asks which pair is faster for the biggest dense model that fits both: 2x RTX PRO 6000 Blackwell 96GB on PCIe Gen5 with NVFP4, or 2x A100 80GB Ampere with NVLink and W4A16. The real question is whether Blackwell's FP4-first stack can outrun A100's HBM2e bandwidth and NVLink path.
// ANALYSIS
My bet: the RTX PRO 6000 pair is the better default for modern dense inference, but only if the serving stack can actually hit NVFP4 end-to-end. The A100 pair still has a bandwidth-first story, so the winner will depend more on backend and model shape than the SKU names suggest.
- –Blackwell's 5th-gen Tensor Cores add FP4, and NVIDIA's NVFP4 guidance shows the format already reaching TensorRT-LLM and vLLM, so the Blackwell path is practical, not theoretical.
- –Two RTX PRO 6000 boards give 192GB aggregate memory versus 160GB on two A100 80GBs, which matters once KV cache and long contexts enter the picture.
- –A100 still leads on raw per-GPU bandwidth, with up to 2.039 TB/s of HBM2e and 600 GB/s NVLink bridge bandwidth for two GPUs, so token-generation-heavy serving can remain competitive.
- –RTX PRO 6000 brings 96GB GDDR7 per card, 1,792 GB/s bandwidth, and PCIe Gen5, so it trades some raw HBM bandwidth for a newer precision stack and a faster host link.
- –For a fair read, benchmark the exact model and serving framework; quantization quality and sharding overhead will dominate once the model fits on both rigs.
// TAGS
llminferencegpubenchmarkrtx-pro-6000-blackwella100-80gb
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
RealTime3392