Qwen3.5 35B fits 5070 Ti, 32GB

// 100d agoTUTORIAL

Qwen3.5 35B fits 5070 Ti, 32GB

A r/LocalLLaMA thread says a 16GB RTX 5070 Ti plus 32GB RAM can handle more than Qwen3.5 9B if you quantize aggressively and accept some CPU offload. The community consensus points to Qwen3.5 35B-A3B as the practical ceiling, with 27B dense as the slower backup.

// ANALYSIS

The real question here is not “what’s the biggest model” but “what size still feels usable once VRAM, RAM, context, and offload all compete for memory.” Qwen3.5’s MoE lineup makes that tradeoff friendlier than most dense models, but the speed cliff arrives fast once you lean on system RAM too hard.

–Community advice lands on 35B-A3B as the sensible upper bound for a 16GB GPU, not a 70B-class dense model
–Quantization is the difference between practical and painful; Q4/Q6 variants are where consumer setups usually live
–Context length matters because KV cache can eat the headroom you thought you had
–Swapping to HDD is the real risk to avoid, but staying under VRAM+RAM with margin usually prevents it
–For most local work, 9B to 35B MoE is the useful exploration band on this hardware

// TAGS

qwen3-5llminferencegpuself-hosted

DISCOVERED

100d ago

2026-04-04

PUBLISHED

100d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

Ytliggrabb

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS12m ago

swyx outlines specialized multi-model AI workflow

In a recent tweet, swyx shared his multi-model AI stack for complex projects, assigning specialized tasks to models like sol ultra for planning, fable 5 for critiquing, and sonnet 5 for code generation. He also highlighted the importance of interactive, interview-style prompting to clarify design decisions.

NEWS14m ago

Tweet mocks Claude Fable 5 safety filters

Indie developer Pieter Levels (@levelsio) shared a post mocking the overly sensitive safety guardrails of Anthropic's Claude Fable 5 AI model. The message satirizes Fable's warning system by claiming a 'life simulation' was downgraded to Opus 4.5 without appeal, highlighting developer frustration with aggressive safety routing.

LAUNCH40m ago

Brockman highlights ChatGPT Work mobile experience

OpenAI President and Co-founder Greg Brockman shared his enthusiasm for ChatGPT Work, noting that while the new agent-based platform has received less attention than other recent updates, it offers a highly functional and impressive mobile experience. Powered by the GPT-5.6 model family, ChatGPT Work transitions ChatGPT from a conversational chatbot into an autonomous agent capable of executing complex, multi-step workflows and cross-app integrations directly from mobile and desktop interfaces.