YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

M1 Max MacBooks throttle under LLMs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

M1 Max MacBooks throttle under LLMs
OPEN LINK ↗
// 45d agoINFRASTRUCTURE

M1 Max MacBooks throttle under LLMs

A LocalLLaMA user says a 64GB M1 Max MacBook Pro starts around 50 tokens/sec but falls to single digits within minutes while running Qwen 3.5 35B A3B. The post asks whether Tahoe, Sequoia, or the machine itself is the real bottleneck for sustained local-LLM inference.

// ANALYSIS

This reads less like a dead-end machine and more like sustained-load physics: 35B-class local inference can push Apple silicon into thermal and power limits fast, and Tahoe-era background work may be adding drag. Users who want stable throughput on a Mac need to think about model size, quantization, cooling, and OS activity together.

  • A 64GB M1 Max is capable, but 35B models are still heavy enough to expose thermal headroom and memory-bandwidth ceilings over time.
  • Reports around macOS Tahoe point to higher temps, constant fan use, and background processes like WindowServer or Spotlight, while some users say Sequoia feels cooler.
  • For local LLMs, smaller quantized models usually give better sustained tokens/sec than chasing a large model that initially benchmarks well and then throttles.
  • If the slowdown happens in minutes, it is worth checking fan behavior, ambient temperature, display scaling, login items, and indexing before blaming the chip outright.
// TAGS
macbook-prollminferencegpuedge-ai

DISCOVERED

45d ago

2026-04-17

PUBLISHED

46d ago

2026-04-17

RELEVANCE

7/ 10

AUTHOR

Ayumu_Kasuga