Gemma 4, Qwen3.5 vie for 36GB VRAM

// 45d agoMODEL RELEASE

Gemma 4, Qwen3.5 vie for 36GB VRAM

A LocalLLaMA user with a 3090 Ti, and soon a 3080 Ti, wants a local model that feels less cramped than Gemma 4 26B Q4 in LM Studio. The thread shifts the question from raw VRAM to which 26B-35B class model actually stays fast, stable, and agent-friendly on a 36GB setup.

// ANALYSIS

36GB is the point where quantization and KV-cache behavior matter more than the headline parameter count; Qwen3.5 looks like the safer default for interactive agent work, while Gemma 4 is the more ambitious pick if you can tolerate heavier memory pressure.

–Gemma 4 is Google’s most capable open family so far, with native tool use, multimodal input, and up to 256K context, but that also makes local inference easier to bottleneck on cache and latency.
–Qwen3.5 has a broader release surface and stronger local-fit options around the 35B-A3B class, which tends to be easier to keep responsive on consumer GPUs.
–For LM Studio and OpenClaw, the practical sweet spot is usually a quantized 26B-35B model with conservative context settings, not the biggest model you can technically load.
–If the priority is agentic reliability and speed, Qwen3.5 is the pragmatic bet; if the priority is raw capability and you can tune around memory costs, Gemma 4 stays compelling.

// TAGS

gemma-4qwen3-5llmopen-weightsinferencegpu

DISCOVERED

45d ago

2026-04-17

PUBLISHED

45d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

choicechoi

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS12m ago

Developer highlights tedious Claude Opus 4.8 UI workflow

Developer David Whatley (@nsxdavid) shared his experience using Anthropic's Claude Opus 4.8 to iteratively refine web interface elements like shapes, fonts, and gradients to a pixel-perfect standard. While the model is highly capable of making precise styling adjustments, Whatley noted that the manual, step-by-step chat process is exceptionally slow and tedious.

NEWS19m ago

OpenAI highlights Proaction's extensive Codex integration

OpenAI Developers featured Proaction, a five-person fleet management startup leveraging OpenAI Codex to automate sales demos, support follow-ups, marketing, and daily engineering. The showcase highlights how early-stage teams can use code models and agentic workflows to dramatically scale their operational capacity.

NEWS1h ago

Foundation Phantom MK-1 undergoes Ukraine field tests

Developed by Foundation Future Industries, the Phantom MK-1 is a defense-focused autonomous humanoid robot designed with custom cycloid actuators for high-payload operations in hazardous environments. The robot recently underwent pilot testing in Ukraine for high-risk supply logistics, marking a significant milestone in real-world defense humanoid deployment.

Gemma 4, Qwen3.5 vie for 36GB VRAM