REDDIT · REDDIT// 2h agoINFRASTRUCTURE

Qwen3.5-35B-A3B Hits Disk Wall on 16GB Macs

A LocalLLaMA user reports that Qwen3.5-35B-A3B can run acceptably on a 16GB Mac Mini for batch inference under llama.cpp, but turning it into an always-on agent loop exposed a different bottleneck: SSD contention, daemon overhead, and system instability before RAM was exhausted. The post argues that this kind of unattended MoE deployment needs more unified memory, or much stricter process isolation, than a small Apple Silicon box can comfortably provide.

// ANALYSIS

Hot take: this is less a model-size story than an OS-scheduling story. Once you mix mmap paging, long-lived CLI harnesses, file watchers, and background automation, local agents start competing with the machine itself. Batch-mode looks viable on 16GB because MoE activation keeps resident memory modest, but that does not translate cleanly to 24/7 operation. The failure mode was disk pressure and background process churn, not a simple out-of-memory crash, which is the more interesting warning for local-agent builders. Claude Code and Codex-style harnesses are not free host-side companions; their lifecycle behavior matters when stacked with a paging model. A 9B/4B routing layer plus a larger on-demand model is a more realistic architecture on 16GB than letting the 35B sit resident all the time.

// TAGS

qwen3-5-35b-a3bllama.cppinferenceagentself-hostedmacoscli

DISCOVERED

2h ago

2026-04-28

PUBLISHED

4h ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

Joozio