Hasty Briefsbeta

Bilingual

Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

12 hours ago
  • #Cloudflare
  • #AI Agents
  • #Serverless Inference
  • Cloudflare is enhancing its platform to support the development and deployment of agents with robust infrastructure primitives like Durable Objects, Workflows, and Dynamic Workers.
  • Workers AI now includes Moonshot AI’s Kimi K2.5 model, featuring a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, ideal for agentic tasks.
  • Internal testing of Kimi K2.5 at Cloudflare demonstrated significant cost savings (77% reduction) and efficiency in tasks like automated code reviews and security checks.
  • The rise in personal and coding agents has made cost a primary concern, pushing enterprises towards open-source models like Kimi K2.5 for scalable solutions.
  • Cloudflare has optimized its inference stack for large models like Kimi K2.5, including custom kernels and techniques like prefix caching to improve performance and reduce costs.
  • New features in Workers AI include prefix caching with discounted pricing for cached tokens, a session affinity header for better cache hits, and a revamped asynchronous API for durable workflows.
  • The asynchronous API now uses a pull-based system to process queued requests efficiently, with event notifications for completion, ideal for non-real-time tasks like code scanning agents.
  • Developers can start using Kimi K2.5 on Workers AI today, with resources available in developer docs, the Agents SDK starter, and a live playground demo.