OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT
HorizonMath finds GPT-5.4 Pro gains on unsolved math
Oxford researchers introduced HorizonMath, a benchmark of 100+ mostly unsolved computational and applied math problems with automatic verification, and reported that GPT-5.4 Pro improved best-known published results on two tasks. The claimed gains, including Kakeya-type and diagonal Ramsey improvements, are framed as potential novel contributions pending expert validation.
// ANALYSIS
This is the kind of benchmark result that matters more than leaderboard gaming, but it should be treated as a strong signal, not a solved milestone, until peer mathematicians fully verify the proofs and constants.
- –HorizonMath targets open problems where verification is tractable, which makes it harder to fake progress with pattern matching.
- –Two concrete improvements on unsolved classes is notable because most frontier models reportedly score near zero on this benchmark.
- –The one-hour reasoning runtime hints that breakthrough-style outputs may be possible with longer test-time compute, not just bigger pretraining.
- –If expert review confirms both results, AI evals may shift toward “novel contribution rate” instead of only accuracy on known-answer sets.
// TAGS
horizonmathgpt-5-4-prollmreasoningbenchmarkresearch
DISCOVERED
25d ago
2026-03-17
PUBLISHED
25d ago
2026-03-17
RELEVANCE
9/ 10
AUTHOR
armytricks