Unsloth fixes MiniMax-M2.7 GGUF overflows
Unsloth identified widespread NaN perplexity issues affecting up to 38% of community MiniMax-M2.7 GGUFs. The culprit was traced to overflow errors in llama.cpp, specifically within the ffn_down_exps block, and has been addressed in updated quants.
This investigation highlights the fragility of large-scale quantization when edge-case overflows are triggered by specific model architectures.
- –The issue was non-linear, with medium-sized quants (Q4_K_XL) failing while smaller I-quants (IQ4_XS) remained stable.
- –Overflow errors were pinpointed to `blk.61.ffn_down_exps`, specifically occurring around chunk 32 of perplexity evaluations.
- –CUDA 13.2 is flagged as a major culprit for numerical "gibberish" across low-bit quants, with CUDA 13.1 recommended as a stable fallback.
- –The fix demonstrates the value of "Dynamic 2.0" quantization in catching and mitigating architectural-specific failures that standard community quants missed.
- –This effectively sets a new standard for validation, requiring chunk-by-chunk PPL monitoring to ensure long-context stability.
DISCOVERED
45d ago
2026-04-15
PUBLISHED
45d ago
2026-04-14
RELEVANCE
AUTHOR
danielhanchen