OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
Anubis benchmark analysis grows, 371 runs
The Mac LLM benchmark dashboard now reflects 371 submitted runs across 218 models and 10 Apple chips. The latest refresh also fixes a thinking-toggle accounting issue that was skewing throughput and TTFT comparisons.
// ANALYSIS
This is the kind of update that makes a niche benchmark useful: more submissions, clearer metrics, and fewer apples-to-oranges comparisons.
- –371 runs and 218 models give the leaderboard more statistical weight, especially for Apple Silicon buyers comparing real-world performance.
- –Separating reasoning time from output throughput is a meaningful fix; mixed accounting can make a model look faster or slower than it really is.
- –The prefill/output split is the right direction for local inference benchmarking because prompt ingestion and generation bottlenecks are very different.
- –The dataset is becoming a tuning signal, not just a scoreboard, which makes it more valuable for model tweakers and Mac-local AI users.
// TAGS
benchmarkevaluationinferencedata-toolsopen-sourceanubis
DISCOVERED
4h ago
2026-05-05
PUBLISHED
6h ago
2026-05-05
RELEVANCE
7/ 10
AUTHOR
peppaz