Unsloth updates Mistral Small 4 quants
Unsloth updated its Mistral Small 4 GGUF repo with refreshed quants and chat-template fixes. The model card says the release targets local inference workflows, including llama.cpp and vLLM, for Mistral Small 4’s 119B MoE model.
This is less a flashy launch than a practical correction: when a big local model behaves badly, the quant/template layer is often the real bug.
- –The repo explicitly calls out “Unsloth chat template fixes,” which is the kind of downstream fix that can materially change output quality
- –For local runners, updated GGUFs matter more than benchmark slides because broken prompting can make a strong base model look mediocre
- –Mistral Small 4 is still a heavyweight 119B MoE model, so these quants are about making it usable, not making it cheap
- –The note to use `--jinja` in llama.cpp suggests the release is also about interoperability, not just compression
- –If you compared older quants and got mixed results, this is the version worth re-testing before writing the model off
DISCOVERED
45d ago
2026-04-19
PUBLISHED
45d ago
2026-04-19
RELEVANCE
AUTHOR
Altruistic_Heat_9531