BACK_TO_FEEDAICRIER_2
GPT-OSS 120B tops 60 tok/sec on M5 Max
OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoBENCHMARK RESULT

GPT-OSS 120B tops 60 tok/sec on M5 Max

OpenAI's 117B parameter MoE model achieves human-reading speeds on the MacBook Pro M5 Max, leveraging 128GB unified memory and the MLX framework. A breakthrough for local inference of high-reasoning models on portable hardware.

// ANALYSIS

The arrival of "workstation-class" performance on a laptop marks the end of cloud dependency for privacy-sensitive professional workflows.

  • MoE architecture only activates 5.1B parameters per token, allowing the 120B model to achieve throughput typical of much smaller dense models
  • M5 Max's 614 GB/s memory bandwidth is the critical enabler, effectively doubling the performance of prior generations for large-scale local inference
  • MXFP4 quantization preserves high precision while fitting the model within 70GB, leaving ample room for 128k context windows on 128GB machines
  • Apache 2.0 licensing combined with local hardware provides a viable, HIPAA-compliant alternative to proprietary APIs for clinical and legal document processing
// TAGS
gpt-oss-120bmlxllminferenceopen-sourceapple-siliconedge-ai

DISCOVERED

8d ago

2026-04-04

PUBLISHED

8d ago

2026-04-03

RELEVANCE

9/ 10

AUTHOR

Plus-Conclusion-3169