OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoNEWS
Mistral Medium 3.5 128B struggles in llama.cpp
A LocalLLaMA user asks whether Mistral Medium 3.5 128B is “broken” in llama.cpp after testing Bartowski Q4 quants on Vulkan with the latest main branch. They report that the model remains coherent but feels unusually weak, with shallow knowledge depth and poor coding results compared with Magistral Small, and they’re asking whether others have had better results in vLLM or with different quantizations.
// ANALYSIS
This looks more like a local inference / quantization mismatch report than evidence that the base model is fundamentally broken.
- –The thread centers on llama.cpp, Vulkan, and Q4 quants, so the observed weakness could come from backend support, quantization quality, or prompt/sampling settings rather than the model weights themselves.
- –The user is comparing it against Magistral Small and finding worse coding and reasoning performance, which suggests a configuration or compatibility issue worth cross-checking in other runtimes like vLLM.
- –Since Mistral positions Medium 3.5 as a 128B dense open-weight flagship for coding and agentic work, reports like this are relevant because they may indicate that some local stacks are not yet getting the intended behavior.
- –As a community signal, it is useful but not conclusive: one Reddit thread is enough to flag a possible issue, not to establish a model-wide regression.
// TAGS
mistralllamacpplocal-llmquantizationvulkanopen-weightscodingreasoning
DISCOVERED
4h ago
2026-04-30
PUBLISHED
8h ago
2026-04-30
RELEVANCE
7/ 10
AUTHOR
EmPips