REDDIT · REDDIT// 4h agoINFRASTRUCTURE

AMD GPUs Split Local LLM Workloads

A LocalLLaMA user asks whether an RX 7900 XTX and RX 6800 XT can pool VRAM for one model, or whether the better play is to split different AI tasks across both cards. The discussion lands on a practical answer: multi-GPU model sharding is possible in llama.cpp, but it is not the same as turning 24 GB + 16 GB into one seamless 40 GB pool.

// ANALYSIS

The real answer is “yes, sort of” for capacity, and “no” for clean pooled-memory performance. If you already own both cards, the smarter setup is usually workload separation on the 6800 XT and the big model on the 7900 XTX, not expecting a free upgrade in speed.

–llama.cpp supports multi-GPU split modes and tensor splitting, so mixed cards can cooperate on one model
–That cooperation has overhead; if a model fits on one GPU, splitting it across two usually hurts throughput
–The most useful pattern here is division of labor: embeddings, reranking, memory management, or a draft model on the smaller card
–Power is the other constraint: 850 W may work with aggressive power limits and undervolting, but there is not much headroom
–This is an infrastructure story, not a model story; the win is fitting larger local models, not magically doubling performance

// TAGS

gpuinferenceself-hostedllmrx-7900-xtxrx-6800-xt

DISCOVERED

4h ago

2026-04-27

PUBLISHED

7h ago

2026-04-27

RELEVANCE

6/ 10

AUTHOR

xeeff