Grok 4.3 regresses on benchmark, cuts costs

// 96d agoBENCHMARK RESULT

Grok 4.3 regresses on benchmark, cuts costs

A Reddit post highlights a sizable performance drop for Grok 4.3 on the Extended NYT Connections Benchmark, falling from 93.4 on Grok 4.20 0309 to 67.5. The tradeoff is that the newer run appears cheaper, so this reads less like a clean upgrade and more like a cost/performance swap that sacrifices puzzle-solving quality.

// ANALYSIS

This is the kind of benchmark regression that makes “better and cheaper” claims look premature unless the product can defend the quality drop on real workloads.

–The headline number is a major regression, not a marginal wobble: 93.4 down to 67.5 is large enough to be a meaningful change in capability.
–The lower cost matters, but only if the cheaper run still meets the user’s bar; here, the benchmark suggests it may not.
–Extended NYT Connections is a narrow benchmark, so it should be treated as one signal rather than a full product verdict.
–If this reflects an actual model change rather than a prompt/eval artifact, it points to a release that optimized efficiency at the expense of reasoning consistency.
–The GitHub benchmark referenced in the post makes this easy to reproduce or challenge, which increases the credibility of the comparison.

// TAGS

grokxaibenchmarknyt-connectionsllmreasoningperformance-regressioncost-efficiency

DISCOVERED

96d ago

2026-05-02

PUBLISHED

96d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

zero0_one1

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH1h ago

Cloudflare launches Kitesurf browser and WebMCP integration

As part of Cloudflare Agents Week, Cloudflare unveiled Kitesurf, a stateless and lightweight browser engineered specifically for AI agents running inside Cloudflare Workers V8 isolates. Alongside Kitesurf, Cloudflare introduced WebMCP integration, enabling websites hosted on Cloudflare to automatically expose Model Context Protocol interfaces so AI agents can execute structured function calls instead of parsing raw DOM elements.

UPDATE1h ago

Tesana AI unveils Loop autonomous game director

Tesana AI announced Loop, an autonomous feature that acts as an AI game director to continuously build, playtest, and polish game components. The feature enables creators to generate refined games from text prompts with minimal manual effort.

LAUNCH1h ago

Aikido Security launches Aikido Machine GPU server

Aikido Security has announced Aikido Machine, an on-premise GPU server that enables organizations to perform AI-driven penetration testing while keeping all code, models, and scan results strictly within their private network.