Powering the agents: Workers AI now runs large models, starting with Kimi K2.5
9 hours ago
- #Cloudflare
- #AI Agents
- #Serverless Inference
- Cloudflare is enhancing its platform to support the development and deployment of agents with robust infrastructure primitives like Durable Objects, Workflows, and Dynamic Workers.
- Workers AI now includes Moonshot AI’s Kimi K2.5 model, featuring a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, ideal for agentic tasks.
- Internal testing of Kimi K2.5 at Cloudflare demonstrated significant cost savings (77% reduction) and efficiency in tasks like automated code reviews and security checks.
- The rise in personal and coding agents has made cost a primary concern, pushing enterprises towards open-source models like Kimi K2.5 for scalable solutions.
- Cloudflare has optimized its inference stack for large models like Kimi K2.5, including custom kernels and techniques like prefix caching to improve performance and reduce costs.
- New features in Workers AI include prefix caching with discounted pricing for cached tokens, a session affinity header for better cache hits, and a revamped asynchronous API for durable workflows.
- The asynchronous API now uses a pull-based system to process queued requests efficiently, with event notifications for completion, ideal for non-real-time tasks like code scanning agents.
- Developers can start using Kimi K2.5 on Workers AI today, with resources available in developer docs, the Agents SDK starter, and a live playground demo.