OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoBENCHMARK RESULT
GPT-OSS 120B tops 60 tok/sec on M5 Max
OpenAI's 117B parameter MoE model achieves human-reading speeds on the MacBook Pro M5 Max, leveraging 128GB unified memory and the MLX framework. A breakthrough for local inference of high-reasoning models on portable hardware.
// ANALYSIS
The arrival of "workstation-class" performance on a laptop marks the end of cloud dependency for privacy-sensitive professional workflows.
- –MoE architecture only activates 5.1B parameters per token, allowing the 120B model to achieve throughput typical of much smaller dense models
- –M5 Max's 614 GB/s memory bandwidth is the critical enabler, effectively doubling the performance of prior generations for large-scale local inference
- –MXFP4 quantization preserves high precision while fitting the model within 70GB, leaving ample room for 128k context windows on 128GB machines
- –Apache 2.0 licensing combined with local hardware provides a viable, HIPAA-compliant alternative to proprietary APIs for clinical and legal document processing
// TAGS
gpt-oss-120bmlxllminferenceopen-sourceapple-siliconedge-ai
DISCOVERED
8d ago
2026-04-04
PUBLISHED
8d ago
2026-04-03
RELEVANCE
9/ 10
AUTHOR
Plus-Conclusion-3169