YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

llama.cpp lands native NVFP4 on Blackwell

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

llama.cpp lands native NVFP4 on Blackwell
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

llama.cpp lands native NVFP4 on Blackwell

llama.cpp b8967 adds native NVFP4 support for Blackwell GPUs, backed by a fresh CUDA benchmark run on an RTX 5090-class system. The posted results show very high prefill throughput and roughly 70 tok/s decode on a Qwen3.6 27B NVFP4 model.

// ANALYSIS

This is a meaningful infrastructure win for local AI on Nvidia’s newest hardware: the format support is in place, and the bench numbers suggest it is already useful for real workloads, not just a compatibility checkbox.

  • Native NVFP4 matters because Blackwell’s 4-bit path is part of the hardware story, not an afterthought; llama.cpp is now tracking that story closely.
  • The benchmark profile is heavily prefill-friendly: 5.5K+ tok/s at short contexts, then a gradual drop as depth increases, which is what you’d expect from a memory-bandwidth and attention-pressure story rather than a pure compute bottleneck.
  • Decode around 70 tok/s on a 27B model is strong for a local setup, especially with the whole model on a single GPU and no CPU offload in the test.
  • This is a good signal for Blackwell owners, but it is still one data point from one model and one build; different architectures, contexts, and batch shapes can change the picture.
  • The release note framing suggests this is the first real integration step, so expect follow-up fixes and tuning as more NVFP4 models and kernels land.
// TAGS
llama-cppgpuinferencebenchmarkopen-source

DISCOVERED

45d ago

2026-04-29

PUBLISHED

45d ago

2026-04-29

RELEVANCE

9/ 10

AUTHOR

mossy_troll_84