llama.cpp sm120 CUDA build hits Windows snag
The Reddit post asks whether anyone has a clean sm120 CUDA build of llama.cpp working on Windows after compile friction on newer GPUs. The poster says Vulkan is stable as a fallback and wants to know whether this is toolchain lag or a real blocker in the project.
This looks less like llama.cpp being fundamentally broken and more like Blackwell/CUDA support still settling on Windows. NVIDIA's CUDA 12.8 docs add SM_120 compiler support, so the architecture itself is real; the rough edge is the surrounding build stack and kernels. llama.cpp's build docs already cover CUDA, non-native builds, and explicit CMAKE_CUDA_ARCHITECTURES, which gives supported escape hatches when auto-detection misbehaves. Other Windows reports on RTX 5090-class hardware show CUDA builds compiling and detecting compute capability 12.0, so this feels like a fragile compatibility pocket rather than a total lack of support. Vulkan is the pragmatic fallback if you want stable local inference now instead of spending time on the newest CUDA edge cases.
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
AUTHOR
prophetadmin