BACK_TO_FEEDAICRIER_2
GPT-5.4 tops Extended Connections benchmark
OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoBENCHMARK RESULT

GPT-5.4 tops Extended Connections benchmark

On Lech Mazur’s Extended NYT Connections benchmark, GPT-5.4 posts 94.0 in extra high mode and 92.0 in medium, beating GPT-5.2’s 88.6 and 71.4 on the same puzzle set. The no-reasoning score rises only modestly to 32.8 from 28.1, which points to most of the gain coming from stronger deliberate reasoning rather than raw pattern matching.

// ANALYSIS

GPT-5.4 looks meaningfully better on a puzzle-heavy reasoning benchmark, but the split between reasoning and no-reasoning modes is the real story for developers evaluating cost, latency, and capability tradeoffs.

  • The medium-mode jump from 71.4 to 92.0 is huge and suggests OpenAI improved practical reasoning efficiency, not just max-effort performance.
  • The benchmark uses 759 NYT Connections puzzles with extra trick words, so it is testing categorization and distractor resistance rather than straight factual recall.
  • The weak no-reasoning score relative to reasoning modes reinforces how much structured inference still matters on deceptively simple language tasks.
  • This is a strong directional signal, but it is still a niche third-party benchmark rather than a full proxy for coding, agent reliability, or production workloads.
// TAGS
gpt-5-4llmreasoningbenchmarkresearch

DISCOVERED

37d ago

2026-03-06

PUBLISHED

37d ago

2026-03-05

RELEVANCE

8/ 10

AUTHOR

zero0_one1