BACK_TO_FEEDAICRIER_2
Mistral Medium 3.5 128B loops on Q4_K_XL
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoMODEL RELEASE

Mistral Medium 3.5 128B loops on Q4_K_XL

A Reddit user reports that Mistral Medium 3.5 128B, running locally at Q4_K_XL on an M2 Max with 96 GB of memory, starts repeating or looping after roughly 500 to 1000 tokens even on the latest llama.cpp build. The thread is framed as a troubleshooting question, with uncertainty about whether the behavior comes from llama.cpp, Unsloth, or the quantization/inference stack rather than the model itself.

// ANALYSIS

This reads more like a long-context serving or quantization instability than a model-release headline, because the failure shows up only after sustained generation and the reporter is already on a current backend build. The report is about local inference rather than an official announcement, and the root cause is still unconfirmed between llama.cpp, Unsloth, and the quantization stack.

// TAGS
mistralmistral-mediumllama.cppunslothquantizationlocal-llmapple-siliconinference-bug

DISCOVERED

3h ago

2026-04-29

PUBLISHED

5h ago

2026-04-29

RELEVANCE

7/ 10

AUTHOR

No_Algae1753