llama.cpp RPC spreads 72B across two PCs

// 91d agoINFRASTRUCTURE

llama.cpp RPC spreads 72B across two PCs

A LocalLLaMA post shows llama.cpp's RPC backend can split a 36GB Qwen2.5-72B-Instruct quant across an RTX 3090, an old RTX 3060, and about 4.3GB of CPU RAM, yielding roughly 3.76 tokens per second over 1GbE. The bigger takeaway is that local LLM users can turn spare hardware into usable VRAM, though the setup still required a custom Docker build because the stock image lacked RPC support.

// ANALYSIS

This is exactly the kind of scrappy infrastructure hack that makes local inference more practical, but it also shows llama.cpp RPC is still a power-user feature rather than a polished deployment path.

–llama.cpp's own RPC docs frame the backend as proof-of-concept and warn against using it on open networks, so this is impressive but not production-ready
–The reported throughput is modest, yet good enough for solo chat and experimentation when the alternative is not running a 72B model at all
–Automatic tensor distribution across local and remote devices makes the feature unusually approachable once the build works
–The real bottleneck here is packaging and ergonomics: having to rebuild Docker with `-DGGML_RPC=ON` is still too much friction for most users

// TAGS

llama-cppllminferencegpuself-hostedopen-source

DISCOVERED

91d ago

2026-03-10

PUBLISHED

95d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

righcoastmike

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE14m ago

Netlify launches an official plugin in the Cursor marketplace to provide AI models with native context on Netlify functions, databases, and deploys.

Netlify has released an official integration in the Cursor Marketplace, bringing developer-focused capabilities directly into the Cursor IDE. The plugin includes 13 skills and 27 rules to give Cursor's AI models precise context regarding Netlify's features, such as functions, edge functions, Blobs, Database, caching, the AI Gateway, CLI, and deployments.

MODEL17m ago

Anthropic launches Claude Fable 5

Anthropic has released Claude Fable 5, its most powerful public model designed specifically for complex, long-running agentic tasks. The model features built-in safety classifiers that automatically reroute sensitive requests in cybersecurity, biology, or chemistry to Claude Opus 4.8.

TUTORIAL43m ago

Matt Pocock ships /teach agent skill

Matt Pocock shared a step-by-step guide for developers seeking to transition from junior to senior using coding agents like Claude Code. The process involves installing his custom /teach skill, setting up a dedicated workspace directory, and running the terminal-based AI agent.