YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

NVLink boosts dual-3090 Qwen throughput

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

NVLink boosts dual-3090 Qwen throughput
OPEN LINK ↗
// 77d agoBENCHMARK RESULT

NVLink boosts dual-3090 Qwen throughput

A LocalLLaMA benchmark shows two RTX 3090s linked with NVLink materially outperform several non-NVLink topologies when running Qwen3.5 27B FP8. The posted results show faster single-stream generation, much higher aggregate throughput under concurrency, and sharply better prefill/TTFT, suggesting interconnect bandwidth still matters for serious multi-GPU local inference.

// ANALYSIS

This is a useful reality check for anyone assuming consumer multi-GPU inference is compute-bound first and topology-bound second.

  • The NVLink setup hit 79.4 tok/s single-stream versus roughly 70-74 tok/s without NVLink
  • Under 20 concurrent generations, throughput jumped to 693.2 tok/s versus about 493-542 tok/s on the non-NVLink layouts
  • Prefill improved the most, rising to 2,181 tok/s with about 7.1s TTFT versus roughly 1,395-1,677 tok/s and 9.2-11.0s TTFT without NVLink
  • The post’s PLX note is the real takeaway: on consumer cards, PCIe topology and peer-to-peer limits can erase a lot of the benefit of adding a second GPU
// TAGS
nvlinkgpuinferencebenchmarkllm

DISCOVERED

77d ago

2026-03-11

PUBLISHED

77d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

Conscious_Cut_6144