Gemma 4 Heretic Q2_K Spits Gibberish
This Reddit post flags the Q2_K GGUF build of the Gemma 4 26B A4B Heretic model as producing gibberish, and suggests the issue may extend to other quants in the repo. The underlying Hugging Face card shows this is a community GGUF release based on `coder3101/gemma-4-26B-A4B-it-heretic`, itself built on Google’s Gemma 4 26B A4B IT model, with Q2_K listed as the smallest option and higher-bit quants like Q4_K_M and Q6_K positioned as better-quality choices.
Hot take: this reads more like “2-bit MoE compression hit the floor” than a fundamentally broken repo. The model is probably not the problem so much as the quantization level, unless there’s also a tokenizer or chat-template mismatch in the runner.
- –The repo is a community GGUF packaging of a fine-tuned Gemma 4 26B A4B model, not the official upstream release.
- –The Hugging Face card itself implies the lower-bit end is risky: it labels some quants as lower quality and points users toward Q4_K_M or Q6_K for better results.
- –The Reddit report is specifically about `Q2_K`, which is the most plausible failure point for gibberish on a model this large and MoE-shaped.
- –Inference: if other quants are also broken, the more likely causes are prompt/template wiring or a bad conversion path, not the entire model family.
DISCOVERED
5h ago
2026-04-20
PUBLISHED
6h ago
2026-04-19
RELEVANCE
AUTHOR
Academic-Map268