Dual RX9070s may boost Qwen3.6-27B throughput
This Reddit post is a practical hardware question from a LocalLLaMA user running Qwen3.6-27B at about 20 tokens per second on a single RX9070 16GB. The core issue is whether adding a second identical GPU would improve token generation speed or whether performance would stay roughly the same. The answer depends less on the model itself and more on whether the serving framework supports multi-GPU parallelism and whether the workload is bottlenecked by compute, memory bandwidth, or PCIe overhead.
Hot take: a second RX9070 is not an automatic 2x TPS upgrade; it helps only when the runtime is configured for tensor or pipeline parallelism, and even then gains can be modest on consumer GPUs.
- –If the model already fits comfortably on one 16GB card and you are running single-stream inference, a second GPU may do little for per-request TPS.
- –If the model needs sharding across GPUs, throughput can improve, but interconnect overhead often prevents linear scaling.
- –For local LLM serving, two GPUs usually help more with larger context, larger batch size, or serving multiple concurrent requests than with raw single-user speed.
- –The most useful benchmark is not “two GPUs vs one” in the abstract, but the exact serving stack: ROCm support, tensor parallel setting, PCIe lane layout, and whether the model is quantized.
DISCOVERED
2h ago
2026-04-30
PUBLISHED
4h ago
2026-04-30
RELEVANCE
AUTHOR
QuinsZouls