BACK_TO_FEEDAICRIER_2
3x 3090 rig enables local Qwen 3.5 122B
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoINFRASTRUCTURE

3x 3090 rig enables local Qwen 3.5 122B

A Reddit user evaluates adding a third RTX 3090 to an X99 system with a Xeon 2680 v4 CPU to run Qwen 3.5 122B inference. The 72GB VRAM total is the critical threshold for running frontier-class models locally, making the hardware investment viable for high-speed local AI.

// ANALYSIS

A triple-3090 build is the "Goldilocks" zone for high-performance local inference, specifically optimized for the new generation of Mixture-of-Experts (MoE) architectures.

  • 72GB VRAM allows for Q3_K_M quants of Qwen 3.5 122B, which provides ~35–45 tokens/sec because its MoE design activates only 10B parameters per token.
  • PCIe 3.0 x8 for the third card is not a bottleneck; MoE models are primarily limited by VRAM bandwidth rather than the PCIe bus speed.
  • The Xeon 2680 v4's modest single-core performance will increase "Time to First Token" (prompt processing) but will not throttle the actual token generation speed.
  • DDR4-2400 system RAM on the X99 platform will severely degrade performance if the model exceeds 72GB, making 4-bit quants (80GB+) functionally unusable.
  • For maximum context windows (128k+), dropping to IQ3_XS quants (~45GB) leaves significant VRAM headroom for the KV cache.
// TAGS
qwen-3.5llminferencegpuself-hostedopen-weights

DISCOVERED

3d ago

2026-04-09

PUBLISHED

3d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

robertpro01