Hasty Briefsbeta

Bilingual

Building an RL environment to train agents for production debugging

a month ago
  • #Ops Diagnostics
  • #RL Environment
  • #Automation
  • Developed an RL environment for ops diagnostics, enabling agents to investigate across Sentry, Supabase, Railway, and Kubernetes.
  • Trained a model on 24 real production tasks, achieving a 2x improvement in performance.
  • Engineers spend 10-20% of their time debugging production bugs, prompting the need for automated solutions.
  • Created an architecture with subagents (Sentry, Supabase, Kubernetes) instead of giving one agent all tools, improving efficiency.
  • Released the architecture as a public HUD environment called cross-service-diagnostics on GitHub.
  • Trained the Sentry subagent on 24 diverse tasks from real production data, ensuring generalization.
  • Used reinforcement learning to optimize the subagent, with training taking around 13 hours and 3,000+ traces.
  • Achieved 2x better performance with the trained model (sentry-o4-mini) compared to the base model.
  • The environment principles apply beyond ops diagnostics to any RL environment for tool-using agents.
  • Full trajectories of agent investigations are captured and can be replayed for analysis.