llama.cpp splits LLMs across GPUs

// 62d agoINFRASTRUCTURE

llama.cpp splits LLMs across GPUs

The post asks whether two P106-100 6GB mining cards can be combined to run Llama 3 8B as one local model with 12GB of effective VRAM. The community answer is that multi-GPU splitting is possible, but the real limits are runtime support, interconnect overhead, and how much context you try to keep resident.

// ANALYSIS

This is a classic local-LLM scaling question: yes, you can shard weights across GPUs, but on cheap older cards the upgrade buys capacity first, speed second.

–`llama.cpp` supports multi-GPU splitting, and vLLM documents single-node tensor parallelism for models that do not fit on one card.
–On Pascal-era mining GPUs, PCIe communication can become the bottleneck, so two cards rarely behave like one clean 12GB GPU.
–Quantization and context size matter as much as raw weights; KV cache can eat the headroom you thought you gained.
–If the stack is plain Transformers, `device_map="auto"` is not the same thing as true tensor parallelism.
–For this use case, the smarter tradeoff is often a smaller quantized model before buying a second used GPU.

// TAGS

llama-cppllminferencegpuself-hostedopen-source

DISCOVERED

62d ago

2026-04-09

PUBLISHED

62d ago

2026-04-09

RELEVANCE

8/ 10

AUTHOR

HelicopterMountain47

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL23m ago

Claude Fable 5 prompts wild user creations

Just sixteen hours after the release of Anthropic's Claude Fable 5, developers have built impressive projects showcasing the model's coding and 3D spatial capabilities. These creations range from browser-based 3D CAD editors to HTML-based Minecraft clones and physical solar system simulators.

NEWS38m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL1h ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.