Mistral Medium 3.5 128B loops on Q4_K_XL
A Reddit user reports that Mistral Medium 3.5 128B, running locally at Q4_K_XL on an M2 Max with 96 GB of memory, starts repeating or looping after roughly 500 to 1000 tokens even on the latest llama.cpp build. The thread is framed as a troubleshooting question, with uncertainty about whether the behavior comes from llama.cpp, Unsloth, or the quantization/inference stack rather than the model itself.
This reads more like a long-context serving or quantization instability than a model-release headline, because the failure shows up only after sustained generation and the reporter is already on a current backend build. The report is about local inference rather than an official announcement, and the root cause is still unconfirmed between llama.cpp, Unsloth, and the quantization stack.
DISCOVERED
3h ago
2026-04-29
PUBLISHED
5h ago
2026-04-29
RELEVANCE
AUTHOR
No_Algae1753