REDDIT · REDDIT// 2h agoTUTORIAL

Qwen3.5-35B-A3B Hits 9 TPS on MacBook Air

This Reddit tutorial shows how to run Qwen3.5-35B-A3B locally on a 16GB M3 MacBook Air with an Unsloth GGUF quant and llama.cpp using mmap. The author reports about 8.9 TPS and shares a working llama-server command plus local API and web UI endpoints.

// ANALYSIS

Hot take: this is a useful local-LLM hack post, not a model launch, and the real value is the memory-mapping workflow rather than the raw speed claim.

–Strong practical signal for people trying to run oversized GGUFs on Apple silicon with limited unified memory.
–The key takeaway is the `--mmap` setup; the rest is mostly tuning flags and a specific quant choice.
–The 8.9 TPS number is anecdotal and hardware/model-quant dependent, so it should be read as a benchmark note rather than a repeatable spec.
–Good fit for builders experimenting with llama.cpp, local inference, and Mac optimization.

// TAGS

qwenqwen3.5ggufllamacppmmapmacbookapple-siliconlocal-llminference

DISCOVERED

2h ago

2026-05-01

PUBLISHED

5h ago

2026-05-01

RELEVANCE

7/ 10

AUTHOR

Sufficient-Bid3874