BACK_TO_FEEDAICRIER_2
Budget 70B rigs revive old Xeons
OPEN_SOURCE ↗
REDDIT · REDDIT// 35d agoINFRASTRUCTURE

Budget 70B rigs revive old Xeons

A LocalLLaMA discussion asks whether a used Xeon workstation with three RTX 3090s on X99 is the cheapest realistic path to running 70B-class models locally, and whether dropping further to X79 and DDR3 would meaningfully hurt experimentation. It is less a product launch than a snapshot of where the local AI scene still is: chasing maximum VRAM per dollar with retired workstation parts.

// ANALYSIS

This is the most practical side of local AI right now — not model hype, but brutal cost engineering around VRAM, lanes, power, and memory generation.

  • The core bet is sound: three 24GB 3090s are still one of the cheapest ways to get enough combined VRAM for serious 70B experimentation
  • X79 and X99 both come from Intel’s 40-lane HEDT/server era, so the question is less “will it boot” and more “how much pain do you accept in bandwidth, thermals, and platform age”
  • X99 is the cleaner choice because newer Xeons and DDR4 reduce platform drag, even if the GPUs still do most of the real inference work
  • Recent llama.cpp multi-GPU discussions suggest performance bottlenecks often come from orchestration and CPU/RAM offload overhead, not just raw PCIe bandwidth
  • The post is notable because it captures the real local-LLM market: old enterprise gear stays relevant as long as consumer GPUs keep winning on VRAM-per-dollar
// TAGS
geforce-rtx-3090llmgpuinferenceself-hosted

DISCOVERED

35d ago

2026-03-07

PUBLISHED

35d ago

2026-03-07

RELEVANCE

6/ 10

AUTHOR

Appropriate-Cap3257