Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

2 months ago

Cloudflare is enhancing its platform to support the development and deployment of agents with robust infrastructure primitives like Durable Objects, Workflows, and Dynamic Workers.
Workers AI now includes Moonshot AI’s Kimi K2.5 model, featuring a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, ideal for agentic tasks.
Internal testing of Kimi K2.5 at Cloudflare demonstrated significant cost savings (77% reduction) and efficiency in tasks like automated code reviews and security checks.
The rise in personal and coding agents has made cost a primary concern, pushing enterprises towards open-source models like Kimi K2.5 for scalable solutions.
Cloudflare has optimized its inference stack for large models like Kimi K2.5, including custom kernels and techniques like prefix caching to improve performance and reduce costs.
New features in Workers AI include prefix caching with discounted pricing for cached tokens, a session affinity header for better cache hits, and a revamped asynchronous API for durable workflows.
The asynchronous API now uses a pull-based system to process queued requests efficiently, with event notifications for completion, ideal for non-real-time tasks like code scanning agents.
Developers can start using Kimi K2.5 on Workers AI today, with resources available in developer docs, the Agents SDK starter, and a live playground demo.

Hasty Briefsbeta