OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoINFRASTRUCTURE
MLX vision workloads hit macOS memory wall
This Reddit post reports that document-image analysis with Qwen3.5 vision models runs out of RAM when served through MLX in LM Studio on a Mac with about 50GB available, while the same workload works with GGUF. The poster suspects an MLX-specific memory behavior in the vision path and is looking for confirmation or a known workaround, since MLX prefill is otherwise much faster on Apple Silicon.
// ANALYSIS
Hot take: this reads like a runtime/memory-management issue in the MLX vision stack, not a model-size problem.
- –The symptom is format-specific: GGUF works, MLX fails, which points to backend allocation behavior rather than the model itself.
- –Vision workloads are usually much heavier on transient memory than text-only inference, so image preprocessing/prefill may be amplifying peak usage.
- –On Apple Silicon, unified memory can make “available RAM” feel generous until a single spike crosses the line.
- –The practical tradeoff is clear: MLX buys speed, but the vision path may still be too memory-hungry for larger document images or certain Qwen3.5 variants.
// TAGS
mlxvision-language-modelsqwenlm-studiomacosramunified-memoryinference
DISCOVERED
9d ago
2026-04-02
PUBLISHED
9d ago
2026-04-02
RELEVANCE
8/ 10
AUTHOR
MrPecunius