llama.cpp benchmarks validate cross-vendor eGPU inference

// 95d agoBENCHMARK RESULT

llama.cpp benchmarks validate cross-vendor eGPU inference

A developer benchmarked llama.cpp's Vulkan backend on a Strix Halo APU paired with an RTX 5070 Ti eGPU via OCuLink, proving cross-vendor tensor splitting works seamlessly. The tests debunk the myth that OCuLink's PCIe 4.0 x4 bandwidth bottlenecks local LLM token generation.

// ANALYSIS

This deep dive into heterogeneous inference confirms that memory bandwidth, not PCIe constraints, dictates local LLM performance.

–Vulkan abstracts the hardware layer to stably combine AMD and NVIDIA architectures with only a 5-10% performance penalty compared to native CUDA or ROCm.
–OCuLink bandwidth uses less than 1% of its capacity during active token generation, as only tiny activation tensors are passed between GPUs.
–Offloading layers to slower system memory causes a non-linear performance drop governed by Amdahl's Law, creating pipeline stalls as the fast eGPU waits for the APU.
–The results pave the way for building massive, high-capacity local inference rigs by pooling cheap unified APU memory with fast dedicated eGPU VRAM.

// TAGS

llama-cppllminferencegpubenchmark

DISCOVERED

95d ago

2026-04-08

PUBLISHED

95d ago

2026-04-07

RELEVANCE

8/ 10

AUTHOR

xspider2000

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL2h ago

Reve 2.1 drops native 4K rendering

Reve has released version 2.1 of its creative image generation model, introducing native 4K rendering, object-level editing, and a new "Live Layers" feature. The update enables users to perform localized edits and manage layouts directly, catering to professional design workflows requiring precise control.

RESEARCH2h ago

UCSD researchers successfully demonstrate the first in-vivo teleoperated surgical procedures using general-purpose humanoid robots.

Researchers at the University of California San Diego (UCSD) have achieved a milestone in medical robotics by using Unitree G1 general-purpose humanoid robots (nicknamed "Surgie") to perform laparoscopic gallbladder removals on live animal subjects. The study, published in Nature, evaluated a teleoperated humanoid platform that utilizes standard surgical instruments via custom-made hand adapters. In the trials, the researchers successfully demonstrated both human-robot teams (a humanoid operated by a teleoperator assisting a human surgeon) and robot-robot teams (two humanoids working cooperatively) to complete the surgical tasks. This research indicates that while humanoid platforms are currently slower and less precise than specialized systems like the da Vinci, they offer a far more compact, versatile, and cost-effective alternative that could expand surgical access to remote, rural, or emergency settings.

OPEN SOURCE2h ago

ABot-World simulates infinite 720p worlds on single GPU

ABot-World is an open-source, action-conditioned infinite world simulator designed to generate interactive 720p environments at 16 frames per second with low latency on a single desktop GPU. By utilizing an NVIDIA RTX 5090 and requiring just 19GB of GPU memory, this embodied world model offers physical compliance, action controllability, and zero-shot generalization, making real-time, interactive environment simulation accessible on consumer-grade hardware.