OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoINFRASTRUCTURE
3x 3090 rig enables local Qwen 3.5 122B
A Reddit user evaluates adding a third RTX 3090 to an X99 system with a Xeon 2680 v4 CPU to run Qwen 3.5 122B inference. The 72GB VRAM total is the critical threshold for running frontier-class models locally, making the hardware investment viable for high-speed local AI.
// ANALYSIS
A triple-3090 build is the "Goldilocks" zone for high-performance local inference, specifically optimized for the new generation of Mixture-of-Experts (MoE) architectures.
- –72GB VRAM allows for Q3_K_M quants of Qwen 3.5 122B, which provides ~35–45 tokens/sec because its MoE design activates only 10B parameters per token.
- –PCIe 3.0 x8 for the third card is not a bottleneck; MoE models are primarily limited by VRAM bandwidth rather than the PCIe bus speed.
- –The Xeon 2680 v4's modest single-core performance will increase "Time to First Token" (prompt processing) but will not throttle the actual token generation speed.
- –DDR4-2400 system RAM on the X99 platform will severely degrade performance if the model exceeds 72GB, making 4-bit quants (80GB+) functionally unusable.
- –For maximum context windows (128k+), dropping to IQ3_XS quants (~45GB) leaves significant VRAM headroom for the KV cache.
// TAGS
qwen-3.5llminferencegpuself-hostedopen-weights
DISCOVERED
3d ago
2026-04-09
PUBLISHED
3d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
robertpro01