LocalLLaMA weighs used 3090s for Gemma 4

// 90d agoINFRASTRUCTURE

LocalLLaMA weighs used 3090s for Gemma 4

A Reddit thread on r/LocalLLaMA asks what GPU makes sense for running Gemma 4 locally for coding and chat on a roughly $700 budget. The consensus leans toward a used RTX 3090, with 24GB AMD and 32GB Intel options mentioned as alternatives, though Google’s current Gemma 4 family is actually 2B, 4B, 26B MoE, and 31B dense rather than a 20B model.

// ANALYSIS

This is the classic local-LLM reality check: VRAM, software support, and context headroom matter more than raw spec-sheet excitement. For a first serious Gemma 4 box, a used 3090 is still the pragmatic answer even if it stretches the budget.

–Google’s Gemma 4 announcement positions the 26B MoE and 31B dense models as local-capable on consumer GPUs when quantized, but 24GB cards will still feel tight once you factor in long context and KV cache.
–Used RTX 3090s remain the safest bet because CUDA support is mature across Ollama, llama.cpp, vLLM, and the rest of the local inference stack.
–AMD’s 7900 XTX is the cleanest fallback if you want 24GB and better availability, but ROCm support is still less frictionless than Nvidia for hobbyist local LLM setups.
–Intel’s Arc Pro B70 looks compelling on paper with 32GB and vGPU/SR-IOV support, but the ecosystem is still immature enough that it’s a riskier starter card.
–The server-side constraints matter here too: PCIe 3.0, Windows VM passthrough, and SolidWorks/RDP usage all push this toward a “works reliably” choice over a “best theoretical value” choice.

// TAGS

gemma-4llmgpuinferenceself-hostedai-coding

DISCOVERED

90d ago

2026-04-19

PUBLISHED

90d ago

2026-04-19

RELEVANCE

7/ 10

AUTHOR

Kaibsora

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK26m ago

GLM-5.2 matches closed models on cyber tasks

The UK AI Security Institute (AISI) has released evaluation results from testing leading open-weight AI models against closed frontier systems on practical cyber work, such as vulnerability research, reverse engineering, exploitation, and multi-step network attacks. The benchmark results indicate that the performance gap between open-weight and closed-weight models is shrinking rapidly, with Z.ai's open-weight GLM-5.2 matching the cyber capabilities of closed frontier models released just four to seven months prior.

NEWS56m ago

Kimi K3 generates multiplayer Halo remake

A viral demonstration on X showcases a functional 10v10 multiplayer recreation of Halo: Combat Evolved built with a single prompt on Moonshot AI's newly released Kimi K3 model. Operating without traditional game engines or development teams, the generated multiplayer demo supports up to 20 players in a lobby and features working weapons, vehicles, and maps, highlighting Kimi K3's advanced logic and coding capabilities.

UPDATE1h ago

Mercury Agent teases biggest update ever

Cosmic Stack Labs' Mercury Agent, an open-source AI agent designed to function as an "always-on" personal assistant with structured persistent memory, permission-hardened tools, and multi-channel access (CLI/Telegram), has teased its largest update to date. The update was announced via their official X account, hinting at significant new features and capabilities for the autonomous assistant.