BACK_TO_FEEDAICRIER_2
GPT-5.4 tops ZeroBench leaderboard
OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT

GPT-5.4 tops ZeroBench leaderboard

GPT-5.4 now leads ZeroBench, a hard multimodal reasoning benchmark built to stress contemporary vision-language models on near-impossible visual questions. The current leaderboard shows GPT-5.4 (xhigh) at 23% pass@5 and 8% pass^5, ahead of Gemini 3.1 Pro at 19% and 7%.

// ANALYSIS

This is a useful benchmark win because ZeroBench is still brutally hard, so even small gains usually reflect real progress in multimodal reasoning rather than leaderboard noise.

  • ZeroBench was introduced as an “impossible” visual benchmark, and frontier models are only now starting to post non-trivial scores
  • GPT-5.4 taking the top spot over Gemini 3.1 Pro suggests OpenAI is still highly competitive on image-heavy reasoning, not just text benchmarks
  • The absolute scores remain low, which is the bigger story for developers: multimodal reasoning is improving, but it is nowhere near solved
  • ZeroBench’s latest site update says its recent v3 wording tweaks did not affect scores, so this looks like a genuine model improvement rather than a benchmark reset
// TAGS
gpt-5-4llmmultimodalreasoningbenchmark

DISCOVERED

32d ago

2026-03-11

PUBLISHED

33d ago

2026-03-10

RELEVANCE

9/ 10

AUTHOR

Waiting4AniHaremFDVR