OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoBENCHMARK RESULT
GPT-OSS 120B flies on M5 Max
A Reddit benchmark on an M5 Max 128GB compares three ~120B models: Nemotron-3 Super, GPT-OSS 120B, and Qwen3.5 122B. GPT-OSS lands behind Nemotron on quality but comes in far ahead on speed, at roughly 77 tokens/sec versus about 35 for the others.
// ANALYSIS
This is a strong reminder that local-LLM performance is no longer just about raw parameter count. On Apple silicon, model architecture and quantization can matter as much as size, and that changes which models feel practical day to day.
- –Nemotron-3 Super looks like the best pick if you care most about answer quality in this specific test.
- –GPT-OSS 120B is the most interesting result because its throughput is high enough to make a 120B model feel interactive.
- –Qwen3.5 122B trailing both suggests “bigger” does not automatically mean “better” once you factor in runtime efficiency.
- –The result is still anecdotal, so it is useful as a real-world signal, not a universal ranking.
// TAGS
gpt-oss-120bllmbenchmarkopen-weightsinferenceself-hosted
DISCOVERED
24d ago
2026-03-19
PUBLISHED
24d ago
2026-03-18
RELEVANCE
8/ 10
AUTHOR
albertgao