REDDIT · REDDIT// 1d agoBENCHMARK RESULT

Mistral Medium 3.5 Q3 disappoints on 3x3090

A Reddit user says Mistral Medium 3.5 runs fast on three 3090s, but the Q3_K_M quantization hurts output quality, especially for code and structured markup. The base model is pitched as a 128B flagship for coding, reasoning, and long-context work, so this reads more like a quantization warning than a verdict on the model itself.

// ANALYSIS

The hot take: speed is not the story here. Q3_K_M is a brutally aggressive local quant, and the failure mode shows up exactly where you'd expect it to show up first: syntax-sensitive generation and formatting discipline.

–The GGUF maintainer labels Q3_K_M as low quality, while recommending Q4/Q5 tiers for better results
–The user's report that Python was acceptable but SVG/HTML broke fits the usual pattern of low-bit quantization harming structured output
–Mistral's own positioning for Medium 3.5 is coding, reasoning, and agentic work, so this is a deployment-quality issue, not necessarily a model-capability issue
–For local inference on consumer GPUs, the practical takeaway is to test a higher quant before concluding the base model is weak at code generation
–The three-3090 setup looks viable for throughput, but not enough to rescue a weak quant when correctness matters more than raw speed

// TAGS

llmopen-weightsquantizationcode-generationai-codinggpumistral-medium-3-5

DISCOVERED

1d ago

2026-05-02

PUBLISHED

1d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

jacek2023