V100 32GB hits 115 t/s on Qwen MoE
Legacy NVIDIA V100 32GB GPUs are seeing a second life in local LLM hosting, achieving 115 t/s on the Qwen3-30B-A3B MoE model. For $500, the aging datacenter card offers a price-to-performance ratio that challenges modern Apple Silicon and RTX 40-series setups.
Mixture-of-Experts (MoE) architectures like Qwen3's A3B variant minimize compute load while leveraging the V100's massive 900 GB/s memory bandwidth. The 32GB of HBM2 VRAM allows for Q5 quantization of 30B+ models, providing higher quality than the 4-bit limits of 16GB/24GB consumer cards. Additionally, the V100 PCIe form factor is currently at a market bottom, making it an ideal entry point for multi-GPU NVLink clusters on a budget. While software support is tapering off, the raw hardware throughput for inference remains competitive for non-training workloads.
DISCOVERED
21d ago
2026-03-22
PUBLISHED
21d ago
2026-03-22
RELEVANCE
AUTHOR
icepatfork