BACK_TO_FEEDAICRIER_2
V100 32GB hits 115 t/s on Qwen MoE
OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoBENCHMARK RESULT

V100 32GB hits 115 t/s on Qwen MoE

Legacy NVIDIA V100 32GB GPUs are seeing a second life in local LLM hosting, achieving 115 t/s on the Qwen3-30B-A3B MoE model. For $500, the aging datacenter card offers a price-to-performance ratio that challenges modern Apple Silicon and RTX 40-series setups.

// ANALYSIS

Mixture-of-Experts (MoE) architectures like Qwen3's A3B variant minimize compute load while leveraging the V100's massive 900 GB/s memory bandwidth. The 32GB of HBM2 VRAM allows for Q5 quantization of 30B+ models, providing higher quality than the 4-bit limits of 16GB/24GB consumer cards. Additionally, the V100 PCIe form factor is currently at a market bottom, making it an ideal entry point for multi-GPU NVLink clusters on a budget. While software support is tapering off, the raw hardware throughput for inference remains competitive for non-training workloads.

// TAGS
nvidia-v100nvidiagpubenchmarkqwenmoelocal-llm

DISCOVERED

21d ago

2026-03-22

PUBLISHED

21d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

icepatfork