YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Llama.cpp enables MXFP4 on older NVIDIA GPUs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Llama.cpp enables MXFP4 on older NVIDIA GPUs
OPEN LINK ↗
// 45d agoINFRASTRUCTURE

Llama.cpp enables MXFP4 on older NVIDIA GPUs

A new update to llama.cpp enables support for MXFP4 (Microscaling Floating Point 4-bit) quantization on older NVIDIA architectures, bypassing the previous requirement for Blackwell hardware. Users are leveraging this to run high-performance sparse MoE models like Qwen 3.6 35B A3B on consumer cards like the RTX 3080.

// ANALYSIS

Software-level emulation of Blackwell-exclusive features democratizes high-efficiency inference for the massive installed base of older GPUs.

  • MXFP4 provides a superior perplexity-to-size ratio compared to standard 4-bit GGUF quants, making it a "sweet spot" for mid-sized models.
  • On Ampere and Ada cards, performance relies on software dequantization (DP4A) rather than hardware acceleration, trading some inference speed for significantly better model quality.
  • The Qwen 3.6 35B A3B model's sparse architecture (activating only 3B parameters) makes it uniquely viable for 10GB-12GB VRAM cards when paired with this format.
  • While native Blackwell support yields a 25-33% speedup, the open-source community's ability to backport these formats ensures hardware longevity for hobbyists.
  • Developers should still benchmark against IQ4_XS, as the software overhead of MXFP4 on older cards can vary significantly depending on the specific kernel implementation.
// TAGS
llama-cppgpuinferencellmopen-sourcemxfp4qwen

DISCOVERED

45d ago

2026-04-21

PUBLISHED

45d ago

2026-04-21

RELEVANCE

8/ 10

AUTHOR

autisticit