OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT
GPT-5.4 tops ZeroBench leaderboard
GPT-5.4 now leads ZeroBench, a hard multimodal reasoning benchmark built to stress contemporary vision-language models on near-impossible visual questions. The current leaderboard shows GPT-5.4 (xhigh) at 23% pass@5 and 8% pass^5, ahead of Gemini 3.1 Pro at 19% and 7%.
// ANALYSIS
This is a useful benchmark win because ZeroBench is still brutally hard, so even small gains usually reflect real progress in multimodal reasoning rather than leaderboard noise.
- –ZeroBench was introduced as an “impossible” visual benchmark, and frontier models are only now starting to post non-trivial scores
- –GPT-5.4 taking the top spot over Gemini 3.1 Pro suggests OpenAI is still highly competitive on image-heavy reasoning, not just text benchmarks
- –The absolute scores remain low, which is the bigger story for developers: multimodal reasoning is improving, but it is nowhere near solved
- –ZeroBench’s latest site update says its recent v3 wording tweaks did not affect scores, so this looks like a genuine model improvement rather than a benchmark reset
// TAGS
gpt-5-4llmmultimodalreasoningbenchmark
DISCOVERED
32d ago
2026-03-11
PUBLISHED
33d ago
2026-03-10
RELEVANCE
9/ 10
AUTHOR
Waiting4AniHaremFDVR