OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoINFRASTRUCTURE
vLLM mixed 3090, 3080 rigs questioned
A LocalLLaMA user asks whether vLLM tensor parallelism can run across 3090s and modded 3080 20G cards. The appeal is cheap extra VRAM, but the thread centers on whether heterogeneous consumer GPUs are stable or efficient enough for real use.
// ANALYSIS
My read: this is a plausible hack, not a cleanly supported setup. vLLM’s docs clearly support multi-GPU tensor parallelism, but the project’s public guidance does not promise smooth behavior with mixed cards, and a GitHub feature request explicitly calls out heterogeneous TP as a gap.
- –vLLM is optimized for sharding model weights across GPUs, so the software path exists; the question is whether the hardware mix behaves well under NCCL and TP scheduling.
- –A 3090 plus modded 3080 20G are close in architecture, but the weaker card will still cap effective throughput and may waste some of the 3090’s headroom.
- –Mixed-GPU setups tend to be more fragile on PCIe-only consumer rigs, where communication overhead can erase the benefit of adding cheaper VRAM.
- –If it works, it is more likely to be acceptable for capacity expansion than for maximizing tokens/sec.
- –The discussion is useful because it reflects a common local-LLM tradeoff: buy matched expensive cards, or accept compatibility risk to stretch budget and memory.
// TAGS
vllmllminferencegpuopen-source
DISCOVERED
6h ago
2026-05-01
PUBLISHED
9h ago
2026-04-30
RELEVANCE
7/ 10
AUTHOR
lblblllb