OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT
Mistral Small 4 vision draws early backlash
A LocalLLaMA discussion reports that Mistral Small 4 misreads a straightforward concert photo even through Mistral’s official API, hallucinating stadium elements that are not present. The post contrasts this with much stronger outputs from smaller competitors and notes older Mistral small models did not show the same failure pattern.
// ANALYSIS
The hot take is that Mistral Small 4 may have shipped with a meaningful real-world vision reliability gap despite strong “unified multimodal” positioning.
- –The failure mode is not subtle: the model invents core scene structure (stadium, track, vehicles), which breaks trust for visual workflows.
- –Because the author reproduced the issue on the official API, the thread shifts blame away from local quantization/runtime setup.
- –Community comparisons to Qwen and prior Mistral small variants suggest a possible regression rather than normal variance.
- –For developers, this looks like a “benchmark vs. product reality” warning: run task-specific image evals before adopting Small 4 in production.
// TAGS
mistral-small-4mistralllmmultimodalbenchmarkapiopen-weights
DISCOVERED
25d ago
2026-03-17
PUBLISHED
25d ago
2026-03-17
RELEVANCE
8/ 10
AUTHOR
EffectiveCeilingFan