BACK_TO_FEEDAICRIER_2
Anubis benchmark analysis grows, 371 runs
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Anubis benchmark analysis grows, 371 runs

The Mac LLM benchmark dashboard now reflects 371 submitted runs across 218 models and 10 Apple chips. The latest refresh also fixes a thinking-toggle accounting issue that was skewing throughput and TTFT comparisons.

// ANALYSIS

This is the kind of update that makes a niche benchmark useful: more submissions, clearer metrics, and fewer apples-to-oranges comparisons.

  • 371 runs and 218 models give the leaderboard more statistical weight, especially for Apple Silicon buyers comparing real-world performance.
  • Separating reasoning time from output throughput is a meaningful fix; mixed accounting can make a model look faster or slower than it really is.
  • The prefill/output split is the right direction for local inference benchmarking because prompt ingestion and generation bottlenecks are very different.
  • The dataset is becoming a tuning signal, not just a scoreboard, which makes it more valuable for model tweakers and Mac-local AI users.
// TAGS
benchmarkevaluationinferencedata-toolsopen-sourceanubis

DISCOVERED

4h ago

2026-05-05

PUBLISHED

6h ago

2026-05-05

RELEVANCE

7/ 10

AUTHOR

peppaz