llama.cpp b9095 adds NCCL-free tensor parallelism

// 2h agoOPENSOURCE RELEASE

llama.cpp b9095 adds NCCL-free tensor parallelism

llama.cpp b9095 adds an internal CUDA AllReduce path for `LLAMA_SPLIT_MODE_TENSOR`, letting dual-GPU setups run tensor parallelism without NCCL. The release notes call out a current target of 2 GPUs, FP32, and tensors up to 256 KB.

// ANALYSIS

This is a meaningful infrastructure step for local inference: it lowers the dependency burden for multi-GPU tensor parallelism and makes dual consumer Blackwell rigs easier to bring up.

–The new internal AllReduce is explicitly NCCL-free, which matters most on desktop-class NVIDIA setups where NCCL can be a setup friction point
–The implementation is still narrow in scope, so this is a practical win for specific dual-GPU workflows rather than a universal multi-GPU answer
–The release notes say the kernel works on Volta-or-newer NVIDIA GPUs, so the impact is broader than the Reddit title implies
–`GGML_CUDA_ALLREDUCE` and `--allreduce` make it easy to compare internal vs NCCL paths and debug regressions
–For local model builders, this kind of plumbing change can improve throughput and reliability without changing the model stack

// TAGS

inferencegpuopen-sourceframeworkclillama-cpp

DISCOVERED

2h ago

2026-05-10

PUBLISHED

5h ago

2026-05-10

RELEVANCE

8/ 10

AUTHOR

Bulky-Priority6824

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK47m ago

Claude Mythos Preview clears METR time-horizon ceiling

Anthropic says an early Claude Mythos Preview snapshot given to METR posts a time horizon more than 2x the next-best model. METR also notes its current suite gets unreliable above 16 hours, so the exact number is less important than the size of the gap.

UPDATE55m ago

Mercury Agent adds Copilot, Codex support

Mercury Agent 1.1.7 adds GitHub Copilot and OpenAI Codex as first-class providers, letting one workflow tap multiple model ecosystems without changing tools. The release also improves live progress tracking, loop detection, and Telegram handling for more controlled long-running agent runs.

OPEN SOURCE1h ago

no-mistakes v1.15.0 adds intent extraction

no-mistakes released v1.15.0 on 2026-05-10. The update adds intent extraction and PR intent sections so validation can use the original agent/session context, and it fixes orphaned intent base SHA handling.