X · X// 4h agoINFRASTRUCTURE

Anthropic cache TTLs backfire on costs

A developer reports that disabling prompt caching on Anthropic models slightly lowered their bill, suggesting the newer cache TTL economics can be worse than expected for low-reuse workloads. The lesson is that prompt caching only pays when the same context gets reused often enough to offset write overhead.

// ANALYSIS

Prompt caching is an optimization, not a guarantee. If your workload is bursty, one-shot, or has long gaps between turns, the cache write tax can outweigh the savings from cache reads.

–Anthropic’s prompt caching supports 5-minute and 1-hour TTLs, and the longer-lived cache costs more to write
–Long-running agent sessions and repeated codebase lookups are the best fit; sparse request patterns are the worst fit
–Per-session economics matter more than aggregate token counts, so “more caching” can still mean “more spend”
–This is a reminder to measure infra at the workload level before assuming the provider’s default is cost-optimal

// TAGS

anthropicllmpricinginferenceapi

DISCOVERED

4h ago

2026-04-30

PUBLISHED

4h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

theo