BACK_TO_FEEDAICRIER_2
Grok 4.3 regresses on benchmark, cuts costs
OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT

Grok 4.3 regresses on benchmark, cuts costs

A Reddit post highlights a sizable performance drop for Grok 4.3 on the Extended NYT Connections Benchmark, falling from 93.4 on Grok 4.20 0309 to 67.5. The tradeoff is that the newer run appears cheaper, so this reads less like a clean upgrade and more like a cost/performance swap that sacrifices puzzle-solving quality.

// ANALYSIS

This is the kind of benchmark regression that makes “better and cheaper” claims look premature unless the product can defend the quality drop on real workloads.

  • The headline number is a major regression, not a marginal wobble: 93.4 down to 67.5 is large enough to be a meaningful change in capability.
  • The lower cost matters, but only if the cheaper run still meets the user’s bar; here, the benchmark suggests it may not.
  • Extended NYT Connections is a narrow benchmark, so it should be treated as one signal rather than a full product verdict.
  • If this reflects an actual model change rather than a prompt/eval artifact, it points to a release that optimized efficiency at the expense of reasoning consistency.
  • The GitHub benchmark referenced in the post makes this easy to reproduce or challenge, which increases the credibility of the comparison.
// TAGS
grokxaibenchmarknyt-connectionsllmreasoningperformance-regressioncost-efficiency

DISCOVERED

1d ago

2026-05-02

PUBLISHED

1d ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

zero0_one1