Building an RL environment to train agents for production debugging
20 days ago
- #Ops Diagnostics
- #RL Environment
- #Automation
- Developed an RL environment for ops diagnostics, enabling agents to investigate across Sentry, Supabase, Railway, and Kubernetes.
- Trained a model on 24 real production tasks, achieving a 2x improvement in performance.
- Engineers spend 10-20% of their time debugging production bugs, prompting the need for automated solutions.
- Created an architecture with subagents (Sentry, Supabase, Kubernetes) instead of giving one agent all tools, improving efficiency.
- Released the architecture as a public HUD environment called cross-service-diagnostics on GitHub.
- Trained the Sentry subagent on 24 diverse tasks from real production data, ensuring generalization.
- Used reinforcement learning to optimize the subagent, with training taking around 13 hours and 3,000+ traces.
- Achieved 2x better performance with the trained model (sentry-o4-mini) compared to the base model.
- The environment principles apply beyond ops diagnostics to any RL environment for tool-using agents.
- Full trajectories of agent investigations are captured and can be replayed for analysis.