YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

GPT-5.4 tops ZeroBench leaderboard

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

GPT-5.4 tops ZeroBench leaderboard
OPEN LINK ↗
// 78d agoBENCHMARK RESULT

GPT-5.4 tops ZeroBench leaderboard

GPT-5.4 now leads ZeroBench, a hard multimodal reasoning benchmark built to stress contemporary vision-language models on near-impossible visual questions. The current leaderboard shows GPT-5.4 (xhigh) at 23% pass@5 and 8% pass^5, ahead of Gemini 3.1 Pro at 19% and 7%.

// ANALYSIS

This is a useful benchmark win because ZeroBench is still brutally hard, so even small gains usually reflect real progress in multimodal reasoning rather than leaderboard noise.

  • ZeroBench was introduced as an “impossible” visual benchmark, and frontier models are only now starting to post non-trivial scores
  • GPT-5.4 taking the top spot over Gemini 3.1 Pro suggests OpenAI is still highly competitive on image-heavy reasoning, not just text benchmarks
  • The absolute scores remain low, which is the bigger story for developers: multimodal reasoning is improving, but it is nowhere near solved
  • ZeroBench’s latest site update says its recent v3 wording tweaks did not affect scores, so this looks like a genuine model improvement rather than a benchmark reset
// TAGS
gpt-5-4llmmultimodalreasoningbenchmark

DISCOVERED

78d ago

2026-03-11

PUBLISHED

79d ago

2026-03-10

RELEVANCE

9/ 10

AUTHOR

Waiting4AniHaremFDVR