Local LLMs hit RTX 5070 Ti limits
A beginner with an RTX 5070 Ti, Ryzen 9 9950X3D, and 64GB of RAM asks how far a local-LLM setup can stretch, and whether bumping to 112GB actually changes the ceiling. The thread lands on the usual truth: more RAM expands what you can load, but VRAM still decides what feels usable.
The real question here is not “what can fit,” but “what can still feel interactive.” Adding RAM helps you experiment with larger quants and CPU-offloaded models, but the 16GB GPU is still the bottleneck for anything that needs real throughput.
- –The sane starter tier is still 8B to 14B instruct models; that’s where you get decent quality without turning every prompt into a waiting game.
- –Qwen’s current family shows why 32B is the next psychological step up: official dense sizes now span 8B, 14B, and 32B, so there’s a clear midrange to target.
- –32B-class models can make sense on 112GB system RAM, but only if you accept quantization and some CPU spillover; it’s a capability win, not a speed win.
- –70B-plus and 100B-plus models are technically possible in heavily quantized form, but on this class of hardware they start feeling like a demo of memory bandwidth limits rather than a practical daily driver.
- –The best first move is to benchmark a few 8B/14B models in something like LM Studio before buying more RAM; local-LLM taste is usually learned by usage, not by specsheets.
DISCOVERED
57d ago
2026-03-31
PUBLISHED
57d ago
2026-03-30
RELEVANCE
AUTHOR
Woondas