BACK_TO_FEEDAICRIER_2
GPT-5.5 Jumps to NYT Connections No. 2
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

GPT-5.5 Jumps to NYT Connections No. 2

GPT-5.5 posts a clear gain on the Extended NYT Connections benchmark, with xhigh reasoning rising from 94.0 to 97.5 and moving it ahead of Claude Opus 4.6. Gemini 3.1 Pro Preview still leads, while Kimi K2.6 becomes the top open-weights model.

// ANALYSIS

This is a real benchmark win for GPT-5.5, but the more interesting signal is how crowded the frontier has become: one step down in reasoning effort can still move a model several points, and open weights are now close enough to matter operationally.

  • GPT-5.5 xhigh goes from 94.0 to 97.5, high from 93.6 to 96.9, medium from 92.0 to 95.0, and no reasoning from 32.8 to 37.5.
  • Gemini 3.1 Pro Preview remains #1 at 98.4, so GPT-5.5 is strong but not a new leader.
  • Kimi K2.6 at 91.4 is the standout open-weights result, ahead of Kimi K2.5 at 78.3 and well above DeepSeek V3.2 at 50.2.
  • Opus 4.7 looks weaker on this benchmark than Opus 4.6, especially with the high-reasoning refusal rate noted in the source thread.
  • This benchmark is a narrow reasoning test, so I would treat it as a model-selection signal, not a general verdict on coding or agent quality.
// TAGS
gpt-5.5kimi-k2.6benchmarkreasoningopen-weightsllm

DISCOVERED

4h ago

2026-04-27

PUBLISHED

4h ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

zero0_one1