LocalLLaMA thread weighs 5090, RTX Pro 6000
A r/LocalLLaMA thread asks which local models make sense for a 5090 plus RTX Pro 6000 box aimed at coding replacement workflows. The early advice points toward modern open coding models in the 20B-30B range first, with larger 128B-class options only if latency and bandwidth can tolerate them.
The GPU pair is impressive, but for coding assistants the real bottleneck is usually model quality, context handling, and serving efficiency, not just raw VRAM. This is a classic local-LLM reality check: bigger hardware expands the menu, but it does not automatically beat the best smaller code models.
- –A 32GB 5090 is already enough for fast dense 20B-30B coding models with decent headroom for context and tool use
- –The RTX Pro 6000 mainly buys flexibility for 70B+ or 128B-class runs, not a guarantee of better coding output
- –Offloading to system RAM is a fallback, but it typically hurts latency enough to undermine the “replacement for paid models” goal
- –PCIe bottlenecks matter less for inference than many people expect; serving stack, batching, and prompt length often dominate user experience
- –The best test is real coding tasks, not token-per-second bragging rights, because agent quality and long-context reliability decide whether the setup is actually useful
DISCOVERED
47d ago
2026-05-01
PUBLISHED
47d ago
2026-05-01
RELEVANCE
AUTHOR
rulerofthehell