YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Gemma-4-26B-A4B hits 15 tps via Unsloth

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Gemma-4-26B-A4B hits 15 tps via Unsloth
OPEN LINK ↗
// 55d agoMODEL RELEASE

Gemma-4-26B-A4B hits 15 tps via Unsloth

Google's Gemma-4-26B-A4B MoE model achieves impressive local inference speeds on mid-range consumer hardware using Unsloth's MXFP4 quantization. The optimization allows the 26B parameter model to run at 12-15 tokens per second on an AMD RX 6600.

// ANALYSIS

The arrival of MXFP4 quantization for the Gemma 4 family marks a significant leap in making high-quality Mixture-of-Experts (MoE) models accessible on budget hardware.

  • MoE architecture with 4B active parameters delivers 26B-level reasoning with the compute footprint of a much smaller model
  • MXFP4 (Microscaling Floating Point) maintains higher precision than traditional 4-bit integer quantization, preserving model intelligence
  • Achieving 15 tps on a sub-$200 GPU like the RX 6600 proves that sophisticated AI is no longer gated by high-end VRAM
  • Support for 256K context windows via Unsloth optimizations enables large-scale local RAG and document analysis
  • First-class Vulkan support in LM Studio further expands the hardware compatibility beyond just NVIDIA users
// TAGS
gemma-4-26b-a4bunslothllmopen-weightsinferencegpumxfp4

DISCOVERED

55d ago

2026-04-03

PUBLISHED

55d ago

2026-04-02

RELEVANCE

9/ 10

AUTHOR

mr_happy_nice