M1 Max MacBooks throttle under LLMs

// 91d agoINFRASTRUCTURE

M1 Max MacBooks throttle under LLMs

A LocalLLaMA user says a 64GB M1 Max MacBook Pro starts around 50 tokens/sec but falls to single digits within minutes while running Qwen 3.5 35B A3B. The post asks whether Tahoe, Sequoia, or the machine itself is the real bottleneck for sustained local-LLM inference.

// ANALYSIS

This reads less like a dead-end machine and more like sustained-load physics: 35B-class local inference can push Apple silicon into thermal and power limits fast, and Tahoe-era background work may be adding drag. Users who want stable throughput on a Mac need to think about model size, quantization, cooling, and OS activity together.

–A 64GB M1 Max is capable, but 35B models are still heavy enough to expose thermal headroom and memory-bandwidth ceilings over time.
–Reports around macOS Tahoe point to higher temps, constant fan use, and background processes like WindowServer or Spotlight, while some users say Sequoia feels cooler.
–For local LLMs, smaller quantized models usually give better sustained tokens/sec than chasing a large model that initially benchmarks well and then throttles.
–If the slowdown happens in minutes, it is worth checking fan behavior, ambient temperature, display scaling, login items, and indexing before blaming the chip outright.

// TAGS

macbook-prollminferencegpuedge-ai

DISCOVERED

91d ago

2026-04-17

PUBLISHED

91d ago

2026-04-17

RELEVANCE

7/ 10

AUTHOR

Ayumu_Kasuga

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE42m ago

Anthropic open-sources its "Code with Claude" workshop materials on GitHub to guide developers in building, evaluating, and decomposing agentic AI workflows.

Anthropic has released the codebase and materials for its "Code with Claude" (CWC) workshops on GitHub as a self-guided learning resource. The repository contains hands-on modules covering model evaluation, multi-agent decomposition using the Model Context Protocol (MCP), and building managed agents like incident dashboards. While the repository is unmaintained and not accepting external contributions, it provides developer-centric tutorials for mastering Anthropic's agentic development tools.

LAUNCH1h ago

Kimi K3 swarm builds macOS 27 simulation

A web-based simulation of a conceptual "macOS 27" operating system, built by Moonshot AI's newly released Kimi K3 model, has gone viral after being shared by Pieter Levels and the tech community. Created by an AI agent swarm, the replica features a mock "Liquid Glass" interface, interactive Dock, and functional apps like 3D Chess, Maps, and FaceTime to highlight Kimi K3's ability to autonomously generate multi-file frontend applications.

UPDATE1h ago

AINFT Platform integrates Kimi K3 API

The AINFT AI Service Platform has integrated support for Moonshot AI’s Kimi K3 API, allowing decentralized developers to access its reasoning and native multimodal capabilities. By combining blockchain infrastructure with the new model, the platform aims to simplify the creation of autonomous, monetizable Web3 AI agents.