OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoBENCHMARK RESULT
Xeon LLM Rig Weighs RTX 3090 Upgrade
A Reddit user running a Xeon E5-2696v3, 64GB ECC, and an RTX 3080 10GB reports about 11 tps on Omnicoder-9B at 262k context and asks whether a cheap RTX 3090 would be worth the jump. The thread centers on a familiar local-LLM tradeoff: more VRAM and less CPU spillover versus only modest raw-speed gains.
// ANALYSIS
The 3090 looks less like a speed boost and more like a capacity fix. If your workload is already bumping into VRAM limits, the extra 14GB is what changes what you can actually run.
- –Officially, the RTX 3090 ships with 24GB GDDR6X on a 384-bit bus, while the RTX 3080 in this class is the 10GB card, so the upgrade is mostly about headroom.
- –In the thread, commenters expect at least a ~20% throughput bump in the best case, but long-context inference usually benefits more from keeping tensors and KV cache resident on GPU.
- –If the model still spills past 24GB, the bottleneck moves to CPU/RAM offload and system plumbing, so dual-GPU complexity may buy less than it sounds like.
- –For remote coding assistants and single-user serving, one 3090 is the cleaner path; rebuilding the whole platform only makes sense if you need bigger models or more concurrency.
// TAGS
llmgpuinferenceself-hostedbenchmarkrtx-3090
DISCOVERED
21d ago
2026-03-21
PUBLISHED
21d ago
2026-03-21
RELEVANCE
7/ 10
AUTHOR
kcksteve