YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Grok 4.3 regresses on benchmark, cuts costs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Grok 4.3 regresses on benchmark, cuts costs
OPEN LINK ↗
// 50d agoBENCHMARK RESULT

Grok 4.3 regresses on benchmark, cuts costs

A Reddit post highlights a sizable performance drop for Grok 4.3 on the Extended NYT Connections Benchmark, falling from 93.4 on Grok 4.20 0309 to 67.5. The tradeoff is that the newer run appears cheaper, so this reads less like a clean upgrade and more like a cost/performance swap that sacrifices puzzle-solving quality.

// ANALYSIS

This is the kind of benchmark regression that makes “better and cheaper” claims look premature unless the product can defend the quality drop on real workloads.

  • The headline number is a major regression, not a marginal wobble: 93.4 down to 67.5 is large enough to be a meaningful change in capability.
  • The lower cost matters, but only if the cheaper run still meets the user’s bar; here, the benchmark suggests it may not.
  • Extended NYT Connections is a narrow benchmark, so it should be treated as one signal rather than a full product verdict.
  • If this reflects an actual model change rather than a prompt/eval artifact, it points to a release that optimized efficiency at the expense of reasoning consistency.
  • The GitHub benchmark referenced in the post makes this easy to reproduce or challenge, which increases the credibility of the comparison.
// TAGS
grokxaibenchmarknyt-connectionsllmreasoningperformance-regressioncost-efficiency

DISCOVERED

50d ago

2026-05-02

PUBLISHED

50d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

zero0_one1