Llama.cpp Vulkan tensor splitting remains unstable for AMD users
AMD users attempting to run large dense models in llama.cpp using Vulkan and tensor-split mode are reporting consistent core dumps. While layer splitting remains a viable workaround for multi-GPU setups, true tensor parallelism on AMD hardware via Vulkan is still highly experimental.
The struggle to get multi-GPU AMD setups working smoothly highlights the ongoing gap between CUDA's maturity and alternative backends.
- –Tensor splitting (-sm tensor) attempts parallel computation across GPUs but currently triggers segfaults in the Vulkan backend for large dense models
- –The community strongly recommends falling back to layer splitting (-sm layer), which sequentially offloads layers and is significantly more stable
- –Explicit context size reduction sometimes prevents crashes, pointing to potential memory handling bugs in Vulkan's tensor-split implementation
- –This serves as a reminder that local AI on non-Nvidia hardware often means choosing between advanced performance optimizations and basic stability
DISCOVERED
2h ago
2026-05-26
PUBLISHED
3h ago
2026-05-26
RELEVANCE
AUTHOR
ParaboloidalCrest