3x 3090 rig enables local Qwen 3.5 122B

// 95d agoINFRASTRUCTURE

3x 3090 rig enables local Qwen 3.5 122B

A Reddit user evaluates adding a third RTX 3090 to an X99 system with a Xeon 2680 v4 CPU to run Qwen 3.5 122B inference. The 72GB VRAM total is the critical threshold for running frontier-class models locally, making the hardware investment viable for high-speed local AI.

// ANALYSIS

A triple-3090 build is the "Goldilocks" zone for high-performance local inference, specifically optimized for the new generation of Mixture-of-Experts (MoE) architectures.

–72GB VRAM allows for Q3_K_M quants of Qwen 3.5 122B, which provides ~35–45 tokens/sec because its MoE design activates only 10B parameters per token.
–PCIe 3.0 x8 for the third card is not a bottleneck; MoE models are primarily limited by VRAM bandwidth rather than the PCIe bus speed.
–The Xeon 2680 v4's modest single-core performance will increase "Time to First Token" (prompt processing) but will not throttle the actual token generation speed.
–DDR4-2400 system RAM on the X99 platform will severely degrade performance if the model exceeds 72GB, making 4-bit quants (80GB+) functionally unusable.
–For maximum context windows (128k+), dropping to IQ3_XS quants (~45GB) leaves significant VRAM headroom for the KV cache.

// TAGS

qwen-3.5llminferencegpuself-hostedopen-weights

DISCOVERED

95d ago

2026-04-09

PUBLISHED

95d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

robertpro01

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE47m ago

OpenDesign integrates Meta Muse Spark API

OpenDesign is an open-source, local-first design workspace that can be paired with Meta's Muse Spark to generate code-ready prototypes and UI screens directly from screenshots and prompts. This integration bridges the gap between visual design and software development, providing developers with an interactive workspace to rapidly iterate on AI-generated user interfaces.

UPDATE47m ago

T3 Code updates agent GUI with git worktrees

T3 Code has updated its local-first GUI for orchestrating AI coding agents, adding multi-provider key and subscription management. The release also introduces native support for git worktrees, custom automation actions, and side-by-side split diffs to safely run multiple agent workflows in parallel.

UPDATE2h ago

Grok Build adds multiline input, scrolling

SpaceXAI has released Grok Build versions 0.2.99 and 0.2.98, introducing multiline input and terminal scrolling for its terminal-based AI coding assistant. The updates allow users to input complex prompts directly on the dashboard and scroll through chat histories using PageUp and PageDown.