Cursor Composer: Building a fast frontier model with RL
6 months ago
- #AI
- #Software Engineering
- #Reinforcement Learning
- Composer is a new agent model designed for software engineering intelligence and speed, achieving frontier coding results with generation speed four times faster than similar models.
- The model is trained to complete real-world software engineering challenges in large codebases, using production search and editing tools to solve diverse problems efficiently.
- Composer is a mixture-of-experts (MoE) language model specialized for software engineering through reinforcement learning (RL), supporting long-context generation and understanding.
- The model is evaluated using Cursor Bench, a benchmark measuring usefulness to software developers, including correctness and adherence to codebase practices.
- Reinforcement learning optimizes the model for interactive development, incentivizing efficient tool use, parallelism, and minimizing unnecessary responses.
- Training infrastructure leverages PyTorch and Ray for asynchronous RL at scale, using MXFP8 MoE kernels for low-precision training and faster inference.
- Composer can call various tools in the Cursor Agent harness, requiring hundreds of thousands of concurrent sandboxed coding environments for effective training.
- The model is already being used by Cursor colleagues for day-to-day software development, aiming to be a valuable tool for users.