OPEN_SOURCE ↗
REDDIT · REDDIT// 27d agoINFRASTRUCTURE
GLM-4.7 358B fits dual RTX Pro 6000 Blackwell?
A user on r/LocalLLaMA is asking whether the full 358B GLM 4.7 model fits in 192GB VRAM across dual RTX Pro 6000 Blackwell GPUs, specifically probing whether NVFP4 quantization clears the threshold for batch size 1 inference.
// ANALYSIS
The dual RTX Pro 6000 Blackwell with 192GB VRAM is one of the first consumer/prosumer setups capable of flirting with 300B+ model inference — this community thread is an early signal of what's possible at this tier.
- –GLM 4.7 at 358B is a massive open-weights model from Zhipu AI; fitting it locally at any reasonable quant is a real capability milestone
- –NVFP4 on Blackwell is a new quantization path that theoretical calculators may not yet accurately model — real-world headroom often exceeds estimates
- –The use case (roleplay + tool calling + RAG) is typical of power users who need uncensored or customizable models that hosted APIs don't allow
- –If NVFP4 fits cleanly, this becomes a strong data point for the viability of sub-$20K local inference setups for frontier-class models
- –Community answers here will inform purchasing decisions for others eyeing this GPU config
// TAGS
glm-4.7llminferencegpuopen-weightsself-hosted
DISCOVERED
27d ago
2026-03-15
PUBLISHED
27d ago
2026-03-15
RELEVANCE
6/ 10
AUTHOR
mircM52