OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE
Mistral Medium 3.5 Gets MLX 4-Bit
A community conversion puts Mistral Medium 3.5 128B into Apple Silicon-friendly MLX 4-bit format at roughly 70 GB, while preserving vision, thinking mode, tool calling, and 256K context. The release also calls out a known repetition-loop problem that appears to be model-level rather than conversion-specific.
// ANALYSIS
Useful for Mac-based local inference, but this is not a clean “download and forget” release yet. The conversion is strong on capability preservation, but the known looping behavior is the kind of issue that can make long agent runs painful.
- –The footprint is still hefty at about 70 GB, so this targets high-RAM Apple Silicon machines rather than typical laptops
- –Keeping the vision encoder in BF16 and leaving projector / lm_head unquantized should preserve multimodal quality better than text-only quantizations
- –Thinking mode and tool calling surviving the conversion is the main reason this matters for agentic workflows, not just chat
- –The local patch for the mlx-vlm sanitize bug suggests conversion hygiene mattered here; provenance is worth checking before benchmarking
- –Repetition loops are the biggest practical risk, so sampling settings and repeat penalty tuning will matter more than headline token speed
// TAGS
mistral-medium-3.5llmmultimodalreasoningopen-weightsinference
DISCOVERED
6h ago
2026-05-01
PUBLISHED
9h ago
2026-04-30
RELEVANCE
9/ 10
AUTHOR
ex-arman68