YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

GPT-OSS 120B tops 60 tok/sec on M5 Max

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

GPT-OSS 120B tops 60 tok/sec on M5 Max
OPEN LINK ↗
// 54d agoBENCHMARK RESULT

GPT-OSS 120B tops 60 tok/sec on M5 Max

OpenAI's 117B parameter MoE model achieves human-reading speeds on the MacBook Pro M5 Max, leveraging 128GB unified memory and the MLX framework. A breakthrough for local inference of high-reasoning models on portable hardware.

// ANALYSIS

The arrival of "workstation-class" performance on a laptop marks the end of cloud dependency for privacy-sensitive professional workflows.

  • MoE architecture only activates 5.1B parameters per token, allowing the 120B model to achieve throughput typical of much smaller dense models
  • M5 Max's 614 GB/s memory bandwidth is the critical enabler, effectively doubling the performance of prior generations for large-scale local inference
  • MXFP4 quantization preserves high precision while fitting the model within 70GB, leaving ample room for 128k context windows on 128GB machines
  • Apache 2.0 licensing combined with local hardware provides a viable, HIPAA-compliant alternative to proprietary APIs for clinical and legal document processing
// TAGS
gpt-oss-120bmlxllminferenceopen-sourceapple-siliconedge-ai

DISCOVERED

54d ago

2026-04-04

PUBLISHED

54d ago

2026-04-03

RELEVANCE

9/ 10

AUTHOR

Plus-Conclusion-3169