GLM-4.7 358B fits dual RTX Pro 6000 Blackwell?

// 73d agoINFRASTRUCTURE

GLM-4.7 358B fits dual RTX Pro 6000 Blackwell?

A user on r/LocalLLaMA is asking whether the full 358B GLM 4.7 model fits in 192GB VRAM across dual RTX Pro 6000 Blackwell GPUs, specifically probing whether NVFP4 quantization clears the threshold for batch size 1 inference.

// ANALYSIS

The dual RTX Pro 6000 Blackwell with 192GB VRAM is one of the first consumer/prosumer setups capable of flirting with 300B+ model inference — this community thread is an early signal of what's possible at this tier.

–GLM 4.7 at 358B is a massive open-weights model from Zhipu AI; fitting it locally at any reasonable quant is a real capability milestone
–NVFP4 on Blackwell is a new quantization path that theoretical calculators may not yet accurately model — real-world headroom often exceeds estimates
–The use case (roleplay + tool calling + RAG) is typical of power users who need uncensored or customizable models that hosted APIs don't allow
–If NVFP4 fits cleanly, this becomes a strong data point for the viability of sub-$20K local inference setups for frontier-class models
–Community answers here will inform purchasing decisions for others eyeing this GPU config

// TAGS

glm-4.7llminferencegpuopen-weightsself-hosted

DISCOVERED

73d ago

2026-03-15

PUBLISHED

73d ago

2026-03-15

RELEVANCE

6/ 10

AUTHOR

mircM52

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS8h ago

Replit hits 50M users building with Claude

Anthropic highlights Replit's Michele Catasta in its new "Problem Solvers" series, revealing that over 50 million people are now building software on Replit using Claude's reasoning models.

UPDATE8h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

VIDEO9h ago

OpenAI teases builder mindset podcast

OpenAI Developers teases an upcoming conversation between @0xmts and Romain Huet about the evolving builder mindset. The episode, dropping May 29, explores how AI is collapsing the distance between ideas and working software.