GPT-5.5 Jumps to NYT Connections No. 2

// 90d agoBENCHMARK RESULT

GPT-5.5 Jumps to NYT Connections No. 2

GPT-5.5 posts a clear gain on the Extended NYT Connections benchmark, with xhigh reasoning rising from 94.0 to 97.5 and moving it ahead of Claude Opus 4.6. Gemini 3.1 Pro Preview still leads, while Kimi K2.6 becomes the top open-weights model.

// ANALYSIS

This is a real benchmark win for GPT-5.5, but the more interesting signal is how crowded the frontier has become: one step down in reasoning effort can still move a model several points, and open weights are now close enough to matter operationally.

–GPT-5.5 xhigh goes from 94.0 to 97.5, high from 93.6 to 96.9, medium from 92.0 to 95.0, and no reasoning from 32.8 to 37.5.
–Gemini 3.1 Pro Preview remains #1 at 98.4, so GPT-5.5 is strong but not a new leader.
–Kimi K2.6 at 91.4 is the standout open-weights result, ahead of Kimi K2.5 at 78.3 and well above DeepSeek V3.2 at 50.2.
–Opus 4.7 looks weaker on this benchmark than Opus 4.6, especially with the high-reasoning refusal rate noted in the source thread.
–This benchmark is a narrow reasoning test, so I would treat it as a model-selection signal, not a general verdict on coding or agent quality.

// TAGS

gpt-5.5kimi-k2.6benchmarkreasoningopen-weightsllm

DISCOVERED

90d ago

2026-04-27

PUBLISHED

90d ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

zero0_one1

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE19m ago

Open-weights releases judged when files hit Hugging Face

In a social post, Martin Szerment argues that the true benchmark for open-weights AI models is when the actual model files hit Hugging Face, not the day of the initial press blog post. The post critiques the trend of treating hype-driven marketing announcements as actual releases, emphasizing that developer availability is what truly matters.

LAUNCH1h ago

Luma AI launches Luma Skills for workflows

Luma AI has launched Luma Skills, a feature designed to help creators and engineering teams package successful generative AI creation steps into reusable and shareable agent skills across image and video pipelines. By turning multi-step generation processes into modular templates, teams can streamline asset production, maintain visual consistency, and integrate automated creative workflows across projects.

UPDATE2h ago

OpenCode 1.18.6 fixes MCP refresh and branch caches

OpenCode version 1.18.6 introduces key stability fixes and performance improvements across its desktop application and underlying client interfaces. This update resolves provider and Model Context Protocol (MCP) refresh issues in App v1, stabilizes v2 client compatibility by pinning the UI to a versioned `@opencode-ai/client` snapshot, and isolates remote reference caches by git branch to prevent cross-branch state collisions.