BACK_TO_FEEDAICRIER_2
Qwen3.5 35B fits 5070 Ti, 32GB
OPEN_SOURCE ↗
REDDIT · REDDIT// 7d agoTUTORIAL

Qwen3.5 35B fits 5070 Ti, 32GB

A r/LocalLLaMA thread says a 16GB RTX 5070 Ti plus 32GB RAM can handle more than Qwen3.5 9B if you quantize aggressively and accept some CPU offload. The community consensus points to Qwen3.5 35B-A3B as the practical ceiling, with 27B dense as the slower backup.

// ANALYSIS

The real question here is not “what’s the biggest model” but “what size still feels usable once VRAM, RAM, context, and offload all compete for memory.” Qwen3.5’s MoE lineup makes that tradeoff friendlier than most dense models, but the speed cliff arrives fast once you lean on system RAM too hard.

  • Community advice lands on 35B-A3B as the sensible upper bound for a 16GB GPU, not a 70B-class dense model
  • Quantization is the difference between practical and painful; Q4/Q6 variants are where consumer setups usually live
  • Context length matters because KV cache can eat the headroom you thought you had
  • Swapping to HDD is the real risk to avoid, but staying under VRAM+RAM with margin usually prevents it
  • For most local work, 9B to 35B MoE is the useful exploration band on this hardware
// TAGS
qwen3-5llminferencegpuself-hosted

DISCOVERED

7d ago

2026-04-04

PUBLISHED

7d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

Ytliggrabb