BACK_TO_FEEDAICRIER_2
OpenCode user weighs RTX 3090 swap
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoINFRASTRUCTURE

OpenCode user weighs RTX 3090 swap

The post describes a developer running OpenCode with a local Qwen3.6-35B-A3B model through llama.cpp on a Tesla P40 + T4 pair, and asks whether swapping the P40 for an RTX 3090 is worth the cost. The current setup already delivers about 25-30 tokens/sec with 256k context, so the upgrade question is really about lowering latency and improving compatibility, not adding capacity.

// ANALYSIS

The 3090 is the right kind of upgrade if the goal is faster inference, but it is not a free win: the big gain comes from Ampere tensor hardware, higher memory bandwidth, and modern CUDA support, while the T4 and PCIe 3.0 host still limit how far the stack can scale.

  • RTX 3090 brings Ampere, 3rd gen Tensor Cores, and 24GB of GDDR6X, which is much better suited to current LLM inference than a Pascal-era P40.
  • The P40 was built for INT8 throughput, but it lacks the newer acceleration path and software headroom that makes today’s local coding setups smoother to run and maintain.
  • For a long-context workload like Qwen3.6-35B-A3B at 256k tokens, the KV cache and layer placement matter as much as raw VRAM, so real-world gains may be smaller than benchmark hype suggests.
  • In a 2U DL380 G9 with a hard budget cap, a blower 3090 is a pragmatic upgrade path, but the best value would still come from the fastest single GPU the chassis can physically and thermally tolerate.
  • The broader signal is strong: local coding agents are now good enough that users are optimizing home inference rigs the way others tune gaming PCs.
// TAGS
opencodellama-cppqwenai-codinginferencegpuself-hostedllm

DISCOVERED

3h ago

2026-04-25

PUBLISHED

4h ago

2026-04-24

RELEVANCE

7/ 10

AUTHOR

RoroTitiFR