Claude thinking changes break cache, Codex may not
A developer's unscientific test suggests Claude invalidates its cache when thinking level changes, while Codex may keep reusing cached context. If true, that difference could skew latency, cost, and benchmark comparisons.
If this holds up, cache behavior matters almost as much as the model setting itself. A reasoning-level toggle that preserves cache can make Codex look faster and cheaper, but it also makes apples-to-apples testing much harder. Anthropic documents that changing thinking parameters invalidates cache breakpoints, so Claude's behavior matches the published rules. If Codex keeps cache across reasoning changes, developers need to pin cache state during evals or results will drift. Reused context is useful for iterative coding, but it can hide whether a reasoning-level switch actually changes behavior. This is a reproducibility issue as much as a performance one: latency, cost, and output quality all become harder to compare cleanly.
DISCOVERED
1h ago
2026-05-07
PUBLISHED
1h ago
2026-05-07
RELEVANCE
AUTHOR
RhysSullivan