BACK_TO_FEEDAICRIER_2
Qwen3-235B-A22B runs on X13, 4 A100s
OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoINFRASTRUCTURE

Qwen3-235B-A22B runs on X13, 4 A100s

A LocalLLaMA user shows off an X13 server with dual Xeon Silver 4415 CPUs, 1 TB of RAM, and four Nvidia A100s that appears aimed at running Qwen3-235B-A22B. It’s a useful snapshot of what “local” looks like once you move into frontier open-weight models.

// ANALYSIS

Open-weight does not mean lightweight, and this rig is a good reminder that frontier self-hosting still looks a lot like a mini-datacenter. The good news is that Qwen’s software stack is mature enough that this kind of deployment is realistic rather than purely aspirational.

  • Qwen3-235B-A22B is a 235B-parameter MoE model with 22B active parameters, so the name understates how much inference machinery is still involved.
  • Qwen’s official docs show the model being served with multi-GPU tensor parallelism, including 8-way BF16 and 4-way FP8/quantized setups, which makes four A100s a credible target.
  • The 1 TB RAM and dual Xeons likely matter as much as the GPUs for KV cache, host-side sharding, and keeping long-context inference stable.
  • Apache 2.0 licensing plus support in vLLM, SGLang, llama.cpp, Ollama, LM Studio, and TensorRT-LLM lowers software friction, but hardware remains the real moat.
// TAGS
qwen3-235b-a22bllmgpuinferenceself-hostedopen-weights

DISCOVERED

14d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

AutomaticBedroom3870