BACK_TO_FEEDAICRIER_2
Mistral Medium 3.5 Gets MLX 4-Bit
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE

Mistral Medium 3.5 Gets MLX 4-Bit

A community conversion puts Mistral Medium 3.5 128B into Apple Silicon-friendly MLX 4-bit format at roughly 70 GB, while preserving vision, thinking mode, tool calling, and 256K context. The release also calls out a known repetition-loop problem that appears to be model-level rather than conversion-specific.

// ANALYSIS

Useful for Mac-based local inference, but this is not a clean “download and forget” release yet. The conversion is strong on capability preservation, but the known looping behavior is the kind of issue that can make long agent runs painful.

  • The footprint is still hefty at about 70 GB, so this targets high-RAM Apple Silicon machines rather than typical laptops
  • Keeping the vision encoder in BF16 and leaving projector / lm_head unquantized should preserve multimodal quality better than text-only quantizations
  • Thinking mode and tool calling surviving the conversion is the main reason this matters for agentic workflows, not just chat
  • The local patch for the mlx-vlm sanitize bug suggests conversion hygiene mattered here; provenance is worth checking before benchmarking
  • Repetition loops are the biggest practical risk, so sampling settings and repeat penalty tuning will matter more than headline token speed
// TAGS
mistral-medium-3.5llmmultimodalreasoningopen-weightsinference

DISCOVERED

6h ago

2026-05-01

PUBLISHED

9h ago

2026-04-30

RELEVANCE

9/ 10

AUTHOR

ex-arman68