Qwen3.5-35B-A3B Pushes Mac Memory Limits

// 90d agoNEWS

Qwen3.5-35B-A3B Pushes Mac Memory Limits

A Reddit thread asks whether an M5 Pro with 24GB unified memory can comfortably run Qwen3.5-35B-A3B or dense 27B models locally. The replies lean hard toward 48GB, with users saying 24GB runs into memory pressure fast once context length grows.

// ANALYSIS

The short answer is that 24GB may be fine for general development, but it is not the comfortable tier for local 35B-class LLM hobby work. If you want room for larger context, fewer compromises, and less babysitting, 48GB is the safer buy.

–Qwen3.5-35B-A3B is a 35B-total MoE model with 3B active parameters, so it is efficient for its class but still not "small" in memory terms.
–The thread's comments are consistent: 24GB is described as barely enough, while 48GB avoids the yellow-zone memory pressure people hit on Apple Silicon.
–For local inference, unified memory matters as much as raw CPU/GPU speed because KV cache and context quickly eat the available headroom.
–If the goal is occasional experimentation, 24GB can work with aggressive quantization and shorter contexts; if the goal is a serious local LLM setup, 48GB is the practical minimum.

// TAGS

llmself-hostedinferenceqwen3-5-35b-a3bapple-silicon

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

6/ 10

AUTHOR

umutkarakoc

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE46m ago

Orca Mobile launches agent chat UI

Orca has released its Chat UI for Orca Mobile in beta for iOS and Android, allowing developers to monitor and control desktop AI coding agents remotely. Developed with RunFusion, the update introduces a free mobile relay service that eliminates the need for a Tailscale setup.

FUNDING52m ago

After Labs emerges from stealth with funding

After Labs is a newly unveiled AI research lab focusing on the development of efficient fluid intelligence. The startup has officially come out of stealth after securing funding, drawing congratulations from prominent AI figures including François Chollet for founders Clem and Matt.

MODEL1h ago

Thinking Machines' Inkling model hits OpenRouter

Thinking Machines Lab has made their new open-weights Mixture-of-Experts (MoE) model, Inkling, available on OpenRouter. The model features 975 billion total and 41 billion active parameters, supports a 1 million token context window, and provides controllable reasoning across text, images, and audio.