YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

llama.cpp multi-GPU bench needs row split

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

llama.cpp multi-GPU bench needs row split
OPEN LINK ↗
// 2h agoTUTORIAL

llama.cpp multi-GPU bench needs row split

A LocalLLaMA user hit a `llama-bench` quirk on a DGX-1-style setup: `--device CUDA0,CUDA1` caused the benchmark to sweep GPUs one at a time instead of tensor-splitting a Qwen3.6-27B run across both cards. The thread points toward `-sm row` for tensor parallelism, with `CUDA_VISIBLE_DEVICES` and `-ts 1,1` as the cleaner way to control which GPUs participate.

// ANALYSIS

The important bit is that `--device` is a visibility filter, not a magic “parallelize these two GPUs” switch, and `-sm tensor` is still experimental enough that the default behavior can surprise you.

  • `llama-bench` runs combinations of test parameters, so the user’s `-d 4096,16384,65536` sweep was expected to iterate; the real mistake was assuming `--device CUDA0,CUDA1` would combine the GPUs into one shared benchmark run
  • The maintainer guidance in llama.cpp says `-sm layer` is mostly pipeline parallelism for larger prompts, while `-sm row` is the tensor-parallel mode if you want work spread across GPUs
  • On short or moderate contexts, multi-GPU overhead can erase the gain, so “two cards” does not automatically beat one fast card
  • For repeatable benchmarking, `CUDA_VISIBLE_DEVICES=0,1` plus explicit split settings is usually less ambiguous than trying to encode topology directly in `--device`
  • This is the kind of llama.cpp tuning problem that matters for real inference deployments, not just synthetic benchmarks
// TAGS
llama-cppllama-benchllmgpubenchmarkinferencedevtoolopen-source

DISCOVERED

2h ago

2026-05-09

PUBLISHED

4h ago

2026-05-09

RELEVANCE

8/ 10

AUTHOR

starkruzr