BACK_TO_FEEDAICRIER_2
HorizonMath finds GPT-5.4 Pro gains on unsolved math
OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT

HorizonMath finds GPT-5.4 Pro gains on unsolved math

Oxford researchers introduced HorizonMath, a benchmark of 100+ mostly unsolved computational and applied math problems with automatic verification, and reported that GPT-5.4 Pro improved best-known published results on two tasks. The claimed gains, including Kakeya-type and diagonal Ramsey improvements, are framed as potential novel contributions pending expert validation.

// ANALYSIS

This is the kind of benchmark result that matters more than leaderboard gaming, but it should be treated as a strong signal, not a solved milestone, until peer mathematicians fully verify the proofs and constants.

  • HorizonMath targets open problems where verification is tractable, which makes it harder to fake progress with pattern matching.
  • Two concrete improvements on unsolved classes is notable because most frontier models reportedly score near zero on this benchmark.
  • The one-hour reasoning runtime hints that breakthrough-style outputs may be possible with longer test-time compute, not just bigger pretraining.
  • If expert review confirms both results, AI evals may shift toward “novel contribution rate” instead of only accuracy on known-answer sets.
// TAGS
horizonmathgpt-5-4-prollmreasoningbenchmarkresearch

DISCOVERED

25d ago

2026-03-17

PUBLISHED

25d ago

2026-03-17

RELEVANCE

9/ 10

AUTHOR

armytricks