BACK_TO_FEEDAICRIER_2
GPT-OSS 120B flies on M5 Max
OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoBENCHMARK RESULT

GPT-OSS 120B flies on M5 Max

A Reddit benchmark on an M5 Max 128GB compares three ~120B models: Nemotron-3 Super, GPT-OSS 120B, and Qwen3.5 122B. GPT-OSS lands behind Nemotron on quality but comes in far ahead on speed, at roughly 77 tokens/sec versus about 35 for the others.

// ANALYSIS

This is a strong reminder that local-LLM performance is no longer just about raw parameter count. On Apple silicon, model architecture and quantization can matter as much as size, and that changes which models feel practical day to day.

  • Nemotron-3 Super looks like the best pick if you care most about answer quality in this specific test.
  • GPT-OSS 120B is the most interesting result because its throughput is high enough to make a 120B model feel interactive.
  • Qwen3.5 122B trailing both suggests “bigger” does not automatically mean “better” once you factor in runtime efficiency.
  • The result is still anecdotal, so it is useful as a real-world signal, not a universal ranking.
// TAGS
gpt-oss-120bllmbenchmarkopen-weightsinferenceself-hosted

DISCOVERED

24d ago

2026-03-19

PUBLISHED

24d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

albertgao