REDDIT · REDDIT// 4h agoBENCHMARK RESULT

GPT-5.5 fails LiveBench agentic coding test

OpenAI's latest model, GPT-5.5, has debuted with a significant regression in agentic coding on the LiveBench benchmark, scoring 56.67 compared to GPT-5.4's 70.00. While marketed as OpenAI's strongest agentic model to date, the "Thinking" variant currently ranks 11th on the contamination-free leaderboard, falling behind both its predecessor and older models like GPT-5.1 Codex.

// ANALYSIS

GPT-5.5's LiveBench failure exposes a critical trade-off between task completion efficiency and novel logic reasoning in OpenAI's architecture. The 13-point drop highlights a struggle with non-contaminated problems, suggesting the model may be over-optimized for existing datasets. While performance on Terminal-Bench 2.0 and SWE-Bench Pro remains high, a 40% reduction in token usage suggests optimization for cost and speed may be degrading deep reasoning. Community reports of "lazy" code completions and context drift further align with these benchmark regressions.

// TAGS

gpt-5-5openaibenchmarkai-codingagentllmreasoning

DISCOVERED

4h ago

2026-04-25

PUBLISHED

5h ago

2026-04-25

RELEVANCE

9/ 10

AUTHOR

Keybug