REDDIT · REDDIT// 6h agoMODEL RELEASE

Mistral Medium 3.5 Gets MLX 4-Bit

A community conversion puts Mistral Medium 3.5 128B into Apple Silicon-friendly MLX 4-bit format at roughly 70 GB, while preserving vision, thinking mode, tool calling, and 256K context. The release also calls out a known repetition-loop problem that appears to be model-level rather than conversion-specific.

// ANALYSIS

Useful for Mac-based local inference, but this is not a clean “download and forget” release yet. The conversion is strong on capability preservation, but the known looping behavior is the kind of issue that can make long agent runs painful.

–The footprint is still hefty at about 70 GB, so this targets high-RAM Apple Silicon machines rather than typical laptops
–Keeping the vision encoder in BF16 and leaving projector / lm_head unquantized should preserve multimodal quality better than text-only quantizations
–Thinking mode and tool calling surviving the conversion is the main reason this matters for agentic workflows, not just chat
–The local patch for the mlx-vlm sanitize bug suggests conversion hygiene mattered here; provenance is worth checking before benchmarking
–Repetition loops are the biggest practical risk, so sampling settings and repeat penalty tuning will matter more than headline token speed

// TAGS

mistral-medium-3.5llmmultimodalreasoningopen-weightsinference

DISCOVERED

6h ago

2026-05-01

PUBLISHED

9h ago

2026-04-30

RELEVANCE

9/ 10

AUTHOR

ex-arman68