BACK_TO_FEEDAICRIER_2
Mac mini 32GB Hits 34 tok/s on gpt-oss-20b
OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT

Mac mini 32GB Hits 34 tok/s on gpt-oss-20b

A Reddit user shared a concrete local inference benchmark for LM Studio on a Mac mini with 32GB of memory, running Unsloth’s gpt-oss-20b-Q4_K_S.gguf at a 26,035-token context. With OpenClaw 2026.3.8, LM Studio 0.4.6+1, and mostly default inference settings, the setup reportedly reached 34 tok/s and about 0.7 seconds to first token after the first prompt.

// ANALYSIS

Real-world local LLM benchmarks like this are useful because they show what a 20B open-weight model feels like on mainstream Apple hardware, not just in polished demos.

  • The setup is specific enough to be actionable: Mac mini 32GB, LM Studio 0.4.6+1, Q4_K_S quantization, 26k context, and mostly default runtime settings.
  • 34 tok/s with sub-second TTFT is a strong practical result for local chat, especially at that context length.
  • This is still a single-user datapoint, so it should be read as directional rather than a controlled benchmark suite.
  • The bigger takeaway is that this class of open-weight model is now comfortably usable on a 32GB desktop.
// TAGS
lm-studiogpt-oss-20blocal-llmmac-minibenchmarkapple-siliconinferencequantization

DISCOVERED

25d ago

2026-03-18

PUBLISHED

25d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

groover75