OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT
Mistral Medium 3.5 Q3 disappoints on 3x3090
A Reddit user says Mistral Medium 3.5 runs fast on three 3090s, but the Q3_K_M quantization hurts output quality, especially for code and structured markup. The base model is pitched as a 128B flagship for coding, reasoning, and long-context work, so this reads more like a quantization warning than a verdict on the model itself.
// ANALYSIS
The hot take: speed is not the story here. Q3_K_M is a brutally aggressive local quant, and the failure mode shows up exactly where you'd expect it to show up first: syntax-sensitive generation and formatting discipline.
- –The GGUF maintainer labels Q3_K_M as low quality, while recommending Q4/Q5 tiers for better results
- –The user's report that Python was acceptable but SVG/HTML broke fits the usual pattern of low-bit quantization harming structured output
- –Mistral's own positioning for Medium 3.5 is coding, reasoning, and agentic work, so this is a deployment-quality issue, not necessarily a model-capability issue
- –For local inference on consumer GPUs, the practical takeaway is to test a higher quant before concluding the base model is weak at code generation
- –The three-3090 setup looks viable for throughput, but not enough to rescue a weak quant when correctness matters more than raw speed
// TAGS
llmopen-weightsquantizationcode-generationai-codinggpumistral-medium-3-5
DISCOVERED
1d ago
2026-05-02
PUBLISHED
1d ago
2026-05-01
RELEVANCE
8/ 10
AUTHOR
jacek2023