Llama.cpp Vulkan tensor splitting remains unstable for AMD users

// 45d agoINFRASTRUCTURE

Llama.cpp Vulkan tensor splitting remains unstable for AMD users

AMD users attempting to run large dense models in llama.cpp using Vulkan and tensor-split mode are reporting consistent core dumps. While layer splitting remains a viable workaround for multi-GPU setups, true tensor parallelism on AMD hardware via Vulkan is still highly experimental.

// ANALYSIS

The struggle to get multi-GPU AMD setups working smoothly highlights the ongoing gap between CUDA's maturity and alternative backends.

–Tensor splitting (-sm tensor) attempts parallel computation across GPUs but currently triggers segfaults in the Vulkan backend for large dense models
–The community strongly recommends falling back to layer splitting (-sm layer), which sequentially offloads layers and is significantly more stable
–Explicit context size reduction sometimes prevents crashes, pointing to potential memory handling bugs in Vulkan's tensor-split implementation
–This serves as a reminder that local AI on non-Nvidia hardware often means choosing between advanced performance optimizations and basic stability

// TAGS

llama-cppinferencegpulocal-firstopen-sourcellm

DISCOVERED

45d ago

2026-05-26

PUBLISHED

45d ago

2026-05-26

RELEVANCE

6/ 10

AUTHOR

ParaboloidalCrest

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE20m ago

Searxly grants local AI private web access

Searxly version 0.9.7 introduces Searxly Agentic Tools, a new feature designed to give local artificial intelligence models private access to the web. By utilizing the browser the user already trusts, the update aims to allow secure and private internet connectivity for local AI agents without relying on third-party cloud services or compromising user privacy.

UPDATE53m ago

OpenAI GPT-5.6 hits DigitalOcean Serverless Inference

DigitalOcean has integrated OpenAI's newly released GPT-5.6 model family—comprising Sol, Terra, and Luna—into its Serverless Inference platform. The fully managed service offers usage-based pricing with no separate OpenAI account required, providing developers with streamlined access to frontier reasoning and high-throughput speed in a unified dashboard.

UPDATE1h ago

Orca adds Grok tracking for coding agents

Stably AI has rolled out usage tracking for Grok within Orca, its desktop Agent Development Environment (ADE) designed for orchestrating parallel AI coding agents. This new feature enables developers to monitor their Grok usage metrics directly within the application, helping prevent unexpected costs when running multiple agent sessions in parallel.