Cloudflare makes Kimi K2.5 3x faster

// 90d agoINFRASTRUCTURE

Cloudflare makes Kimi K2.5 3x faster

Cloudflare says Workers AI now serves Moonshot’s Kimi K2.5 at production scale, and that a stack of inference optimizations made it roughly 3x faster. The launch positions Kimi as the first large model in Workers AI and pairs the model with platform upgrades like custom kernels, prefix caching, session affinity, and async inference.

// ANALYSIS

This is the real story behind “fast AI”: the model matters, but the serving stack is where most of the leverage lives. Cloudflare is showing that frontier open models are becoming an infrastructure problem, not just a benchmark problem.

–Custom kernels and disaggregated prefill are the kind of low-level wins that most teams cannot reproduce on their own
–`x-session-affinity` is a smart way to convert repeated agent context into cache hits, lower TTFT, and lower token spend
–The async API is the right fit for code scanning and research agents where reliability matters more than immediate response
–The 77% cost reduction claim is the strongest signal here: open weights only become operationally relevant when the serving economics work
–For teams self-hosting Kimi, this is a reminder that “out of the box” throughput is usually leaving money on the table

// TAGS

kimi-k2-5inferencegpucloudagentllm

DISCOVERED

90d ago

2026-04-16

PUBLISHED

90d ago

2026-04-16

RELEVANCE

9/ 10

AUTHOR

dok2001

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK1h ago

Grok 4.5 ranks second on FrontierSWE

SpaceXAI's new Grok 4.5 model ranks second overall on Proximal's FrontierSWE software engineering benchmark and first in research capabilities. Trained in collaboration with Cursor, the model features a 500,000-token context window optimized for reasoning and agentic tasks.

UPDATE1h ago

@agent-browser/eve adds browser automation to Eve

The `@agent-browser/eve` package enables Vercel's filesystem-first agent framework, `eve`, to run browser automation tasks through `agent-browser`. By installing the `@agent-browser/eve` package, developers can equip their AI agents with a browser to fill out forms, test websites, collect data, and automate workflows.

LAUNCH1h ago

NVIDIA Unveils Smaller Jetson Thor Modules

NVIDIA has expanded its Jetson Thor robotics platform with the new Blackwell-based T3000 and T2000 modules, bringing high-performance edge AI to mainstream robotics. The compact modules deliver up to 865 FP4 teraflops of compute and 32GB of memory at half the size and power of the flagship T5000.