RTX 6000 x4 build weighs Qwen3.5 models

// 86d agoINFRASTRUCTURE

RTX 6000 x4 build weighs Qwen3.5 models

A r/LocalLLaMA user with four RTX 6000 Max-Q cards and 768GB RAM is trying to pick the best local models for code auditing, fuzzing, and other security tooling with minimal quality loss. The thread centers on Qwen3.5-122B-A10B and Qwen3.5-397B-A17B, while commenters push a tiered setup instead of one giant model.

// ANALYSIS

Both candidates are MoE models, so active parameters matter more than headline size. The real decision is less "122B vs 397B" and more "which compromise gives you enough quality without making the serving stack too fragile?"

–Qwen3.5-122B-A10B is 122B total / 10B active, so BF16 is the cleaner quality-first choice for everyday local use: https://huggingface.co/Qwen/Qwen3.5-122B-A10B
–Qwen3.5-397B-A17B is 397B total / 17B active, which makes Q6_K a sensible fit strategy, but still a deliberate compromise rather than a no-brainer default: https://huggingface.co/Qwen/Qwen3.5-397B-A17B
–Qwen’s own serving docs lean on current vLLM, SGLang, and KTransformers builds, and vLLM’s `--language-model-only` can free memory for more KV cache if you are not using vision. I’m inferring a 4-GPU setup will want tighter context limits or more aggressive quantization than the docs’ 8-GPU examples show.
–For fuzzing and code auditing, a smaller task model plus a CPU-side helper is likely to beat trying to force one giant model to do everything.

// TAGS

qwen-3.5llmgpuinferenceopen-weightsself-hostedcode-reviewtesting

DISCOVERED

86d ago

2026-03-22

PUBLISHED

86d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

Direct_Bodybuilder63

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE43m ago

Zhipu AI open-sources GLM-5.2 coding model

Zhipu AI has released GLM-5.2, a 753-billion-parameter coding model with a 1-million-token context window, and open-sourced its weights under the MIT license. The model is available for local deployment via Hugging Face and through API access on Z.ai and OpenRouter.

NEWS1h ago

Cline leads open-source alternatives to SpaceX Cursor

Following SpaceX's acquisition of Cursor, developer Nav Toor shared 'The Cursor Acquisition Survival Kit,' curating ten open-source alternatives led by Cline. The list spotlights Cline for its model agnosticism, 80.8% SWE-bench score, and local execution capabilities that avoid platform lock-in.

FUNDING1h ago

Bland AI raises $50M Series C

Bland AI has announced a $50 million Series C funding round to accelerate its mission of automating complex phone-based workflows. The platform provides an API-first infrastructure for developers to build low-latency voice agents that can manage long-form, nonlinear calls in regulated industries.