OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoTUTORIAL
Qwen3.5-35B-A3B Hits 9 TPS on MacBook Air
This Reddit tutorial shows how to run Qwen3.5-35B-A3B locally on a 16GB M3 MacBook Air with an Unsloth GGUF quant and llama.cpp using mmap. The author reports about 8.9 TPS and shares a working llama-server command plus local API and web UI endpoints.
// ANALYSIS
Hot take: this is a useful local-LLM hack post, not a model launch, and the real value is the memory-mapping workflow rather than the raw speed claim.
- –Strong practical signal for people trying to run oversized GGUFs on Apple silicon with limited unified memory.
- –The key takeaway is the `--mmap` setup; the rest is mostly tuning flags and a specific quant choice.
- –The 8.9 TPS number is anecdotal and hardware/model-quant dependent, so it should be read as a benchmark note rather than a repeatable spec.
- –Good fit for builders experimenting with llama.cpp, local inference, and Mac optimization.
// TAGS
qwenqwen3.5ggufllamacppmmapmacbookapple-siliconlocal-llminference
DISCOVERED
2h ago
2026-05-01
PUBLISHED
5h ago
2026-05-01
RELEVANCE
7/ 10
AUTHOR
Sufficient-Bid3874