OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoINFRASTRUCTURE
Qwen3-235B-A22B runs on X13, 4 A100s
A LocalLLaMA user shows off an X13 server with dual Xeon Silver 4415 CPUs, 1 TB of RAM, and four Nvidia A100s that appears aimed at running Qwen3-235B-A22B. It’s a useful snapshot of what “local” looks like once you move into frontier open-weight models.
// ANALYSIS
Open-weight does not mean lightweight, and this rig is a good reminder that frontier self-hosting still looks a lot like a mini-datacenter. The good news is that Qwen’s software stack is mature enough that this kind of deployment is realistic rather than purely aspirational.
- –Qwen3-235B-A22B is a 235B-parameter MoE model with 22B active parameters, so the name understates how much inference machinery is still involved.
- –Qwen’s official docs show the model being served with multi-GPU tensor parallelism, including 8-way BF16 and 4-way FP8/quantized setups, which makes four A100s a credible target.
- –The 1 TB RAM and dual Xeons likely matter as much as the GPUs for KV cache, host-side sharding, and keeping long-context inference stable.
- –Apache 2.0 licensing plus support in vLLM, SGLang, llama.cpp, Ollama, LM Studio, and TensorRT-LLM lowers software friction, but hardware remains the real moat.
// TAGS
qwen3-235b-a22bllmgpuinferenceself-hostedopen-weights
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
AutomaticBedroom3870