BACK_TO_FEEDAICRIER_2
Cloudflare makes Kimi K2.5 3x faster
OPEN_SOURCE ↗
X · X// 3h agoINFRASTRUCTURE

Cloudflare makes Kimi K2.5 3x faster

Cloudflare says Workers AI now serves Moonshot’s Kimi K2.5 at production scale, and that a stack of inference optimizations made it roughly 3x faster. The launch positions Kimi as the first large model in Workers AI and pairs the model with platform upgrades like custom kernels, prefix caching, session affinity, and async inference.

// ANALYSIS

This is the real story behind “fast AI”: the model matters, but the serving stack is where most of the leverage lives. Cloudflare is showing that frontier open models are becoming an infrastructure problem, not just a benchmark problem.

  • Custom kernels and disaggregated prefill are the kind of low-level wins that most teams cannot reproduce on their own
  • `x-session-affinity` is a smart way to convert repeated agent context into cache hits, lower TTFT, and lower token spend
  • The async API is the right fit for code scanning and research agents where reliability matters more than immediate response
  • The 77% cost reduction claim is the strongest signal here: open weights only become operationally relevant when the serving economics work
  • For teams self-hosting Kimi, this is a reminder that “out of the box” throughput is usually leaving money on the table
// TAGS
kimi-k2-5inferencegpucloudagentllm

DISCOVERED

3h ago

2026-04-16

PUBLISHED

4h ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

dok2001