BACK_TO_FEEDAICRIER_2
Anthropic cache TTLs backfire on costs
OPEN_SOURCE ↗
X · X// 4h agoINFRASTRUCTURE

Anthropic cache TTLs backfire on costs

A developer reports that disabling prompt caching on Anthropic models slightly lowered their bill, suggesting the newer cache TTL economics can be worse than expected for low-reuse workloads. The lesson is that prompt caching only pays when the same context gets reused often enough to offset write overhead.

// ANALYSIS

Prompt caching is an optimization, not a guarantee. If your workload is bursty, one-shot, or has long gaps between turns, the cache write tax can outweigh the savings from cache reads.

  • Anthropic’s prompt caching supports 5-minute and 1-hour TTLs, and the longer-lived cache costs more to write
  • Long-running agent sessions and repeated codebase lookups are the best fit; sparse request patterns are the worst fit
  • Per-session economics matter more than aggregate token counts, so “more caching” can still mean “more spend”
  • This is a reminder to measure infra at the workload level before assuming the provider’s default is cost-optimal
// TAGS
anthropicllmpricinginferenceapi

DISCOVERED

4h ago

2026-04-30

PUBLISHED

4h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

theo