BACK_TO_FEEDAICRIER_2
Dual RX9070s may boost Qwen3.6-27B throughput
OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoINFRASTRUCTURE

Dual RX9070s may boost Qwen3.6-27B throughput

This Reddit post is a practical hardware question from a LocalLLaMA user running Qwen3.6-27B at about 20 tokens per second on a single RX9070 16GB. The core issue is whether adding a second identical GPU would improve token generation speed or whether performance would stay roughly the same. The answer depends less on the model itself and more on whether the serving framework supports multi-GPU parallelism and whether the workload is bottlenecked by compute, memory bandwidth, or PCIe overhead.

// ANALYSIS

Hot take: a second RX9070 is not an automatic 2x TPS upgrade; it helps only when the runtime is configured for tensor or pipeline parallelism, and even then gains can be modest on consumer GPUs.

  • If the model already fits comfortably on one 16GB card and you are running single-stream inference, a second GPU may do little for per-request TPS.
  • If the model needs sharding across GPUs, throughput can improve, but interconnect overhead often prevents linear scaling.
  • For local LLM serving, two GPUs usually help more with larger context, larger batch size, or serving multiple concurrent requests than with raw single-user speed.
  • The most useful benchmark is not “two GPUs vs one” in the abstract, but the exact serving stack: ROCm support, tensor parallel setting, PCIe lane layout, and whether the model is quantized.
// TAGS
local-llminferencemulti-gpuamd-gpurocmqwenperformancetps

DISCOVERED

2h ago

2026-04-30

PUBLISHED

4h ago

2026-04-30

RELEVANCE

7/ 10

AUTHOR

QuinsZouls