YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Mac mini 32GB Hits 34 tok/s on gpt-oss-20b

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Mac mini 32GB Hits 34 tok/s on gpt-oss-20b
OPEN LINK ↗
// 71d agoBENCHMARK RESULT

Mac mini 32GB Hits 34 tok/s on gpt-oss-20b

A Reddit user shared a concrete local inference benchmark for LM Studio on a Mac mini with 32GB of memory, running Unsloth’s gpt-oss-20b-Q4_K_S.gguf at a 26,035-token context. With OpenClaw 2026.3.8, LM Studio 0.4.6+1, and mostly default inference settings, the setup reportedly reached 34 tok/s and about 0.7 seconds to first token after the first prompt.

// ANALYSIS

Real-world local LLM benchmarks like this are useful because they show what a 20B open-weight model feels like on mainstream Apple hardware, not just in polished demos.

  • The setup is specific enough to be actionable: Mac mini 32GB, LM Studio 0.4.6+1, Q4_K_S quantization, 26k context, and mostly default runtime settings.
  • 34 tok/s with sub-second TTFT is a strong practical result for local chat, especially at that context length.
  • This is still a single-user datapoint, so it should be read as directional rather than a controlled benchmark suite.
  • The bigger takeaway is that this class of open-weight model is now comfortably usable on a 32GB desktop.
// TAGS
lm-studiogpt-oss-20blocal-llmmac-minibenchmarkapple-siliconinferencequantization

DISCOVERED

71d ago

2026-03-18

PUBLISHED

71d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

groover75