llama.cpp hits Blackwell with NVFP4 support
llama.cpp adds native NVFP4 support for NVIDIA Blackwell GPUs, offering up to 2.3x faster prompt processing. Recent server updates may require explicit model pathing to fix a "missing models" UI bug reported after compilation.
The integration of NVFP4 makes Blackwell the new gold standard for local LLM inference, while the project continues to refine its standalone server architecture. NVFP4 support allows Blackwell's 4-bit floating point to deliver massive speedups with minimal quality loss. The convert_hf_to_gguf.py script now natively handles ModelOpt-optimized Safetensors, streamlining the pipeline from Hugging Face to GGUF. Reports of "missing models" in the llama-server UI are typically resolved by checking the --model flag or ensuring the binary is compiled with the correct Blackwell architecture flags. This update solidifies llama.cpp as the leading edge for running high-performance quantized models on consumer-grade hardware.
DISCOVERED
18d ago
2026-03-25
PUBLISHED
18d ago
2026-03-25
RELEVANCE
AUTHOR
mossy_troll_84