Dual 3090 PC weighs local agents

// 90d agoINFRASTRUCTURE

Dual 3090 PC weighs local agents

A LocalLLaMA user is planning a fully offline agentic coding setup on dual RTX 3090s with 128GB RAM, weighing Qwen models, vLLM, llama.cpp, coding agents, and speech-to-text. The thread reflects a broader shift from “can I run a local LLM?” to “can I run a useful private coding agent stack?”

// ANALYSIS

The practical answer is less about the biggest model and more about keeping latency, context, and tool-calling reliable enough for daily coding.

–Dual 3090s give 48GB VRAM, which is strongest for 30B-35B class coding models at higher quantization or 70B-class models at more painful speed/quality tradeoffs
–vLLM is the better default when throughput, batching, long context, and OpenAI-compatible serving matter; llama.cpp still wins for GGUF simplicity, hybrid CPU/GPU offload, and quick experimentation
–Qwen3.5/Qwen3.6 27B-35B class models are the likely sweet spot for local coding agents, while 100B+ “orchestrator” setups risk becoming slow demos unless the workflow tolerates latency
–Agent harness choice matters as much as model choice: OpenCode, Cline-style flows, Claude Code-compatible adapters, and local OpenAI APIs are where the stack either becomes productive or collapses into prompt fiddling
–Adding Whisper or whisper.cpp for local STT makes sense, but it is secondary to nailing inference stability, context length, and tool-call correctness first

// TAGS

local-agentic-coding-workstationai-codingagentinferencegpuself-hostedopen-weightsspeech

DISCOVERED

90d ago

2026-04-21

PUBLISHED

90d ago

2026-04-21

RELEVANCE

7/ 10

AUTHOR

youcloudsofdoom

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

KOPI AI Agent launches stock skill

KOPI AI Agent has introduced a new Stock Skill aimed at providing smarter stock analysis for the US and Hong Kong markets. The tool leverages the autonomous agent's capabilities in multi-turn reasoning and tool calling to synthesize cross-market movements and assist in investment decisions.

INFRA1h ago

Z.ai completes 1GW domestic chip data center

Z.ai (Zhipu AI) has completed construction of a massive 1-gigawatt AI data center powered entirely by domestic Chinese silicon. This major infrastructure milestone is specifically designed to train the company's next-generation GLM frontier models, signaling a significant leap forward in China's AI self-sufficiency in the face of ongoing U.S. export restrictions.

UPDATE1h ago

Qwen3.8-Max-Preview boosts web frontend coding

Alibaba's flagship 2.4-trillion-parameter Qwen 3.8 Max model is receiving continuous daily updates during its preview phase, with a particular focus on improving its web frontend code generation quality. As Alibaba's most powerful multimodal model to date, it aims to compete with leading frontier systems, with plans to eventually release it as an open-weight model.