OPEN_SOURCE ↗
REDDIT · REDDIT// 35d agoINFRASTRUCTURE
Budget 70B rigs revive old Xeons
A LocalLLaMA discussion asks whether a used Xeon workstation with three RTX 3090s on X99 is the cheapest realistic path to running 70B-class models locally, and whether dropping further to X79 and DDR3 would meaningfully hurt experimentation. It is less a product launch than a snapshot of where the local AI scene still is: chasing maximum VRAM per dollar with retired workstation parts.
// ANALYSIS
This is the most practical side of local AI right now — not model hype, but brutal cost engineering around VRAM, lanes, power, and memory generation.
- –The core bet is sound: three 24GB 3090s are still one of the cheapest ways to get enough combined VRAM for serious 70B experimentation
- –X79 and X99 both come from Intel’s 40-lane HEDT/server era, so the question is less “will it boot” and more “how much pain do you accept in bandwidth, thermals, and platform age”
- –X99 is the cleaner choice because newer Xeons and DDR4 reduce platform drag, even if the GPUs still do most of the real inference work
- –Recent llama.cpp multi-GPU discussions suggest performance bottlenecks often come from orchestration and CPU/RAM offload overhead, not just raw PCIe bandwidth
- –The post is notable because it captures the real local-LLM market: old enterprise gear stays relevant as long as consumer GPUs keep winning on VRAM-per-dollar
// TAGS
geforce-rtx-3090llmgpuinferenceself-hosted
DISCOVERED
35d ago
2026-03-07
PUBLISHED
35d ago
2026-03-07
RELEVANCE
6/ 10
AUTHOR
Appropriate-Cap3257