REDDIT · REDDIT// 6h agoINFRASTRUCTURE

vLLM mixed 3090, 3080 rigs questioned

A LocalLLaMA user asks whether vLLM tensor parallelism can run across 3090s and modded 3080 20G cards. The appeal is cheap extra VRAM, but the thread centers on whether heterogeneous consumer GPUs are stable or efficient enough for real use.

// ANALYSIS

My read: this is a plausible hack, not a cleanly supported setup. vLLM’s docs clearly support multi-GPU tensor parallelism, but the project’s public guidance does not promise smooth behavior with mixed cards, and a GitHub feature request explicitly calls out heterogeneous TP as a gap.

–vLLM is optimized for sharding model weights across GPUs, so the software path exists; the question is whether the hardware mix behaves well under NCCL and TP scheduling.
–A 3090 plus modded 3080 20G are close in architecture, but the weaker card will still cap effective throughput and may waste some of the 3090’s headroom.
–Mixed-GPU setups tend to be more fragile on PCIe-only consumer rigs, where communication overhead can erase the benefit of adding cheaper VRAM.
–If it works, it is more likely to be acceptable for capacity expansion than for maximizing tokens/sec.
–The discussion is useful because it reflects a common local-LLM tradeoff: buy matched expensive cards, or accept compatibility risk to stretch budget and memory.

// TAGS

vllmllminferencegpuopen-source

DISCOVERED

6h ago

2026-05-01

PUBLISHED

9h ago

2026-04-30

RELEVANCE

7/ 10

AUTHOR

lblblllb