BACK_TO_FEEDAICRIER_2
M1 Max MacBooks throttle under LLMs
OPEN_SOURCE ↗
REDDIT · REDDIT// 14h agoINFRASTRUCTURE

M1 Max MacBooks throttle under LLMs

A LocalLLaMA user says a 64GB M1 Max MacBook Pro starts around 50 tokens/sec but falls to single digits within minutes while running Qwen 3.5 35B A3B. The post asks whether Tahoe, Sequoia, or the machine itself is the real bottleneck for sustained local-LLM inference.

// ANALYSIS

This reads less like a dead-end machine and more like sustained-load physics: 35B-class local inference can push Apple silicon into thermal and power limits fast, and Tahoe-era background work may be adding drag. Users who want stable throughput on a Mac need to think about model size, quantization, cooling, and OS activity together.

  • A 64GB M1 Max is capable, but 35B models are still heavy enough to expose thermal headroom and memory-bandwidth ceilings over time.
  • Reports around macOS Tahoe point to higher temps, constant fan use, and background processes like WindowServer or Spotlight, while some users say Sequoia feels cooler.
  • For local LLMs, smaller quantized models usually give better sustained tokens/sec than chasing a large model that initially benchmarks well and then throttles.
  • If the slowdown happens in minutes, it is worth checking fan behavior, ambient temperature, display scaling, login items, and indexing before blaming the chip outright.
// TAGS
macbook-prollminferencegpuedge-ai

DISCOVERED

14h ago

2026-04-17

PUBLISHED

15h ago

2026-04-17

RELEVANCE

7/ 10

AUTHOR

Ayumu_Kasuga