Gemma-4-26B-A4B hits 15 tps via Unsloth

// 55d agoMODEL RELEASE

Gemma-4-26B-A4B hits 15 tps via Unsloth

Google's Gemma-4-26B-A4B MoE model achieves impressive local inference speeds on mid-range consumer hardware using Unsloth's MXFP4 quantization. The optimization allows the 26B parameter model to run at 12-15 tokens per second on an AMD RX 6600.

// ANALYSIS

The arrival of MXFP4 quantization for the Gemma 4 family marks a significant leap in making high-quality Mixture-of-Experts (MoE) models accessible on budget hardware.

–MoE architecture with 4B active parameters delivers 26B-level reasoning with the compute footprint of a much smaller model
–MXFP4 (Microscaling Floating Point) maintains higher precision than traditional 4-bit integer quantization, preserving model intelligence
–Achieving 15 tps on a sub-$200 GPU like the RX 6600 proves that sophisticated AI is no longer gated by high-end VRAM
–Support for 256K context windows via Unsloth optimizations enables large-scale local RAG and document analysis
–First-class Vulkan support in LM Studio further expands the hardware compatibility beyond just NVIDIA users

// TAGS

gemma-4-26b-a4bunslothllmopen-weightsinferencegpumxfp4

DISCOVERED

55d ago

2026-04-03

PUBLISHED

55d ago

2026-04-02

RELEVANCE

9/ 10

AUTHOR

mr_happy_nice

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL58m ago

Prism ML launches Bonsai Image 4B variants

Prism ML has released Bonsai Image 4B, a compact text-to-image diffusion model family built from FLUX.2 Klein 4B for local inference on Apple Silicon and NVIDIA GPUs. The launch includes 1-bit and ternary variants, plus Bonsai Studio for trying the model on iPhone.

OPEN SOURCE1h ago

OpenMobius-skill packages ICT, SMC for agents

OpenMobius-skill turns ICT and smart money concepts into a reusable skill for Claude Code, Codex, OpenClaw, and Hermes, backed by 964 knowledge cards, live market data, and chart generation. Its 0.2.0 update on 2026-05-23 made the SMC structural indicator the default analysis path and added automatic overlays plus freshness disclosure.

OPEN SOURCE1h ago

Hallmark fights AI template sameness

Hallmark is an open-source design skill for Claude Code, Cursor, and Codex that pushes generated UIs away from samey, default-looking layouts. It varies macrostructure, theme, and layout, then runs style gates before handing work back.