NVLink boosts dual-3090 Qwen throughput

// 123d agoBENCHMARK RESULT

NVLink boosts dual-3090 Qwen throughput

A LocalLLaMA benchmark shows two RTX 3090s linked with NVLink materially outperform several non-NVLink topologies when running Qwen3.5 27B FP8. The posted results show faster single-stream generation, much higher aggregate throughput under concurrency, and sharply better prefill/TTFT, suggesting interconnect bandwidth still matters for serious multi-GPU local inference.

// ANALYSIS

This is a useful reality check for anyone assuming consumer multi-GPU inference is compute-bound first and topology-bound second.

–The NVLink setup hit 79.4 tok/s single-stream versus roughly 70-74 tok/s without NVLink
–Under 20 concurrent generations, throughput jumped to 693.2 tok/s versus about 493-542 tok/s on the non-NVLink layouts
–Prefill improved the most, rising to 2,181 tok/s with about 7.1s TTFT versus roughly 1,395-1,677 tok/s and 9.2-11.0s TTFT without NVLink
–The post’s PLX note is the real takeaway: on consumer cards, PCIe topology and peer-to-peer limits can erase a lot of the benefit of adding a second GPU

// TAGS

nvlinkgpuinferencebenchmarkllm

DISCOVERED

123d ago

2026-03-11

PUBLISHED

123d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

Conscious_Cut_6144

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Lightpanda merges IndexedDB support for automation

Lightpanda, the open-source headless browser engine written in Zig for web automation and AI agents, has added base implementation support for IndexedDB to its main branch. This update allows scripts that depend on IndexedDB for client-side storage to execute successfully, removing a significant barrier for automation and scraping workflows on modern web applications.

OPEN SOURCE1h ago

LangChain-Chatchat builds local private RAG pipelines

LangChain-Chatchat is an open-source, local knowledge-based QA application and RAG framework built on LangChain, FastAPI, and Streamlit. It provides a private, offline pipeline that integrates with Ollama and Xinference to support open-source models like Llama3 and Qwen2.

OPEN SOURCE2h ago

prose stylesheet forces clean AI writing

prose is a lightweight, single-file Markdown prompt configuration that guides AI coding agents to communicate like a direct, confident senior engineer. Appended directly to local agent instruction files, it establishes clear rules to eliminate common AI patterns like cheesy setups, over-bulleted reasoning, and theatrical language.