BACK_TO_FEEDAICRIER_2
Qwen3.6 Fits RTX 6000 Pro, 96GB
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE

Qwen3.6 Fits RTX 6000 Pro, 96GB

The Reddit thread asks whether an RTX 6000 Pro can comfortably serve Qwen3.6, and the consensus is yes: 96GB VRAM is plenty for the first open-weight Qwen3.6 model. The real constraint is less raw fit and more whether you want room for long context, KV cache, and concurrency.

// ANALYSIS

The hot take: 96GB is not the question here, throughput and serving setup are. For a single-user or small-team local stack, RTX 6000 Pro looks like a very safe target for Qwen3.6-35B-A3B, but the moment you push longer contexts or multiple sessions, memory headroom gets eaten fast.

  • Qwen3.6-35B-A3B is a sparse MoE model with 35B total and 3B active parameters, so it is much easier to host than a dense model of similar headline size.
  • Official Qwen docs say the model is supported in vLLM, SGLang, llama.cpp, and Transformers, which makes local deployment realistic rather than experimental.
  • Reddit replies suggest 96GB can handle BF16, FP8, or Q8 comfortably, with one commenter saying they already run it in full BF16 on 96GB.
  • The thread also hints at the real tradeoff: if you want big context windows and tool use, KV cache and concurrency matter as much as model weights.
  • For a local coding assistant or RAG server, this is probably overkill in a good way; for a multi-user inference box, you still need to benchmark before assuming “fits” means “fast.”
// TAGS
qwen3.6-35b-a3bllminferencegpuself-hostedvrammoe

DISCOVERED

4h ago

2026-04-19

PUBLISHED

8h ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

Emergency_Brief_9141