REDDIT · REDDIT// 35d agoINFRASTRUCTURE

Budget 70B rigs revive old Xeons

A LocalLLaMA discussion asks whether a used Xeon workstation with three RTX 3090s on X99 is the cheapest realistic path to running 70B-class models locally, and whether dropping further to X79 and DDR3 would meaningfully hurt experimentation. It is less a product launch than a snapshot of where the local AI scene still is: chasing maximum VRAM per dollar with retired workstation parts.

// ANALYSIS

This is the most practical side of local AI right now — not model hype, but brutal cost engineering around VRAM, lanes, power, and memory generation.

–The core bet is sound: three 24GB 3090s are still one of the cheapest ways to get enough combined VRAM for serious 70B experimentation
–X79 and X99 both come from Intel’s 40-lane HEDT/server era, so the question is less “will it boot” and more “how much pain do you accept in bandwidth, thermals, and platform age”
–X99 is the cleaner choice because newer Xeons and DDR4 reduce platform drag, even if the GPUs still do most of the real inference work
–Recent llama.cpp multi-GPU discussions suggest performance bottlenecks often come from orchestration and CPU/RAM offload overhead, not just raw PCIe bandwidth
–The post is notable because it captures the real local-LLM market: old enterprise gear stays relevant as long as consumer GPUs keep winning on VRAM-per-dollar

// TAGS

geforce-rtx-3090llmgpuinferenceself-hosted

DISCOVERED

35d ago

2026-03-07

PUBLISHED

35d ago

2026-03-07

RELEVANCE

6/ 10

AUTHOR

Appropriate-Cap3257