OpenCode user weighs RTX 3090 swap

// 45d agoINFRASTRUCTURE

OpenCode user weighs RTX 3090 swap

The post describes a developer running OpenCode with a local Qwen3.6-35B-A3B model through llama.cpp on a Tesla P40 + T4 pair, and asks whether swapping the P40 for an RTX 3090 is worth the cost. The current setup already delivers about 25-30 tokens/sec with 256k context, so the upgrade question is really about lowering latency and improving compatibility, not adding capacity.

// ANALYSIS

The 3090 is the right kind of upgrade if the goal is faster inference, but it is not a free win: the big gain comes from Ampere tensor hardware, higher memory bandwidth, and modern CUDA support, while the T4 and PCIe 3.0 host still limit how far the stack can scale.

–RTX 3090 brings Ampere, 3rd gen Tensor Cores, and 24GB of GDDR6X, which is much better suited to current LLM inference than a Pascal-era P40.
–The P40 was built for INT8 throughput, but it lacks the newer acceleration path and software headroom that makes today’s local coding setups smoother to run and maintain.
–For a long-context workload like Qwen3.6-35B-A3B at 256k tokens, the KV cache and layer placement matter as much as raw VRAM, so real-world gains may be smaller than benchmark hype suggests.
–In a 2U DL380 G9 with a hard budget cap, a blower 3090 is a pragmatic upgrade path, but the best value would still come from the fastest single GPU the chassis can physically and thermally tolerate.
–The broader signal is strong: local coding agents are now good enough that users are optimizing home inference rigs the way others tune gaming PCs.

// TAGS

opencodellama-cppqwenai-codinginferencegpuself-hostedllm

DISCOVERED

45d ago

2026-04-25

PUBLISHED

45d ago

2026-04-24

RELEVANCE

7/ 10

AUTHOR

RoroTitiFR

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS32m ago

Claude Mythos release odds surge on Polymarket

AI commentators and prediction markets are speculating on the imminent release of Anthropic's "Claude Mythos" model, with Polymarket pricing the chance of a June 10 release at 65% and a July 31 release at 97%. Originally restricted to defensive cybersecurity partners under "Project Glasswing" due to safety concerns, any potential release or update of the model is attracting intense scrutiny within the AI community.

NEWS43m ago

Claude Code costs top $1,100 in 10 days

Developer Theo shared on X that ten days after reactivating his $200 Claude Code subscription, his inference usage had already exceeded $1,100, according to the ccusage command-line tool. He noted that the vast majority of this intensive usage was dedicated to auditing code output produced by "5.5," demonstrating that heavily leveraging terminal-based AI agents can lead to massive token consumption and high costs in active software development.

MODEL1h ago

Anthropic reportedly releases Claude Mythos tomorrow

Reports indicate that Anthropic is preparing to release its next-generation AI model, Claude Mythos, tomorrow. Previously restricted under Project Glasswing due to offensive cybersecurity capabilities, the model's broader release is expected to significantly impact the AI safety and security landscape.

OpenCode user weighs RTX 3090 swap