YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5-35B-A3B Hits 9 TPS on MacBook Air

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5-35B-A3B Hits 9 TPS on MacBook Air
OPEN LINK ↗
// 50d agoTUTORIAL

Qwen3.5-35B-A3B Hits 9 TPS on MacBook Air

This Reddit tutorial shows how to run Qwen3.5-35B-A3B locally on a 16GB M3 MacBook Air with an Unsloth GGUF quant and llama.cpp using mmap. The author reports about 8.9 TPS and shares a working llama-server command plus local API and web UI endpoints.

// ANALYSIS

Hot take: this is a useful local-LLM hack post, not a model launch, and the real value is the memory-mapping workflow rather than the raw speed claim.

  • Strong practical signal for people trying to run oversized GGUFs on Apple silicon with limited unified memory.
  • The key takeaway is the `--mmap` setup; the rest is mostly tuning flags and a specific quant choice.
  • The 8.9 TPS number is anecdotal and hardware/model-quant dependent, so it should be read as a benchmark note rather than a repeatable spec.
  • Good fit for builders experimenting with llama.cpp, local inference, and Mac optimization.
// TAGS
qwenqwen3.5ggufllamacppmmapmacbookapple-siliconlocal-llminference

DISCOVERED

50d ago

2026-05-01

PUBLISHED

50d ago

2026-05-01

RELEVANCE

7/ 10

AUTHOR

Sufficient-Bid3874