OPEN_SOURCE ↗
REDDIT · REDDIT// 7d agoTUTORIAL
Qwen3.5 35B fits 5070 Ti, 32GB
A r/LocalLLaMA thread says a 16GB RTX 5070 Ti plus 32GB RAM can handle more than Qwen3.5 9B if you quantize aggressively and accept some CPU offload. The community consensus points to Qwen3.5 35B-A3B as the practical ceiling, with 27B dense as the slower backup.
// ANALYSIS
The real question here is not “what’s the biggest model” but “what size still feels usable once VRAM, RAM, context, and offload all compete for memory.” Qwen3.5’s MoE lineup makes that tradeoff friendlier than most dense models, but the speed cliff arrives fast once you lean on system RAM too hard.
- –Community advice lands on 35B-A3B as the sensible upper bound for a 16GB GPU, not a 70B-class dense model
- –Quantization is the difference between practical and painful; Q4/Q6 variants are where consumer setups usually live
- –Context length matters because KV cache can eat the headroom you thought you had
- –Swapping to HDD is the real risk to avoid, but staying under VRAM+RAM with margin usually prevents it
- –For most local work, 9B to 35B MoE is the useful exploration band on this hardware
// TAGS
qwen3-5llminferencegpuself-hosted
DISCOVERED
7d ago
2026-04-04
PUBLISHED
7d ago
2026-04-04
RELEVANCE
8/ 10
AUTHOR
Ytliggrabb