Building an RL environment to train agents for production debugging

3 months ago

Developed an RL environment for ops diagnostics, enabling agents to investigate across Sentry, Supabase, Railway, and Kubernetes.
Trained a model on 24 real production tasks, achieving a 2x improvement in performance.
Engineers spend 10-20% of their time debugging production bugs, prompting the need for automated solutions.
Created an architecture with subagents (Sentry, Supabase, Kubernetes) instead of giving one agent all tools, improving efficiency.
Released the architecture as a public HUD environment called cross-service-diagnostics on GitHub.
Trained the Sentry subagent on 24 diverse tasks from real production data, ensuring generalization.
Used reinforcement learning to optimize the subagent, with training taking around 13 hours and 3,000+ traces.
Achieved 2x better performance with the trained model (sentry-o4-mini) compared to the base model.
The environment principles apply beyond ops diagnostics to any RL environment for tool-using agents.
Full trajectories of agent investigations are captured and can be replayed for analysis.

Hasty Briefsbeta