Anubis benchmark analysis grows, 371 runs

// 45d agoBENCHMARK RESULT

Anubis benchmark analysis grows, 371 runs

The Mac LLM benchmark dashboard now reflects 371 submitted runs across 218 models and 10 Apple chips. The latest refresh also fixes a thinking-toggle accounting issue that was skewing throughput and TTFT comparisons.

// ANALYSIS

This is the kind of update that makes a niche benchmark useful: more submissions, clearer metrics, and fewer apples-to-oranges comparisons.

–371 runs and 218 models give the leaderboard more statistical weight, especially for Apple Silicon buyers comparing real-world performance.
–Separating reasoning time from output throughput is a meaningful fix; mixed accounting can make a model look faster or slower than it really is.
–The prefill/output split is the right direction for local inference benchmarking because prompt ingestion and generation bottlenecks are very different.
–The dataset is becoming a tuning signal, not just a scoreboard, which makes it more valuable for model tweakers and Mac-local AI users.

// TAGS

benchmarkevaluationinferencedata-toolsopen-sourceanubis

DISCOVERED

45d ago

2026-05-05

PUBLISHED

45d ago

2026-05-05

RELEVANCE

7/ 10

AUTHOR

peppaz

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

Sentdex dissects Claude Fable drama, GLM-5.2 launch

AI YouTuber Sentdex has published a new video sorting through the hype and polarization surrounding Anthropic's Claude Fable 5 drama. The video addresses the controversy following the model's sudden government-ordered suspension over jailbreak and national security concerns, while also covering the launch of Z.ai's new open-source mixture-of-experts model, GLM-5.2.

BENCHMARK1h ago

BrowserCode integrates GLM 5.2 support

BrowserCode, a browser agent harness by the browser-use team, has tested the new open-weights model GLM 5.2, reporting near-Opus-level benchmark scores at a significantly lower cost. According to the announcement, a browser-based task using GLM 5.2 in the harness cost only $0.18, proving that open-weights models are catching up to proprietary alternatives while remaining highly cost-effective.

UPDATE1h ago

Higgsfield integrates Grok Imagine Video 1.5

Higgsfield has integrated xAI's new Grok Imagine Video 1.5 model, which features native synchronized audio generation, into its AI video creation platform. This integration allows creators to combine Higgsfield's cinematic camera controls with the high-fidelity video and audio output of xAI's latest model.