Hasty Briefsbeta

Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

14 hours ago
  • #AI Performance
  • #Exponential Growth
  • #Task Automation
  • AI performance can be measured by the length of tasks AI agents can complete, showing exponential growth with a doubling time of around 7 months.
  • Current AI models excel at text prediction and knowledge tasks but struggle with longer, multi-step projects or substituting human labor.
  • Human task completion time strongly predicts AI success rates, with current models succeeding in tasks under 4 minutes but failing in tasks over 4 hours.
  • Historical data shows a consistent exponential increase in the length of tasks AI can complete, suggesting significant future capabilities.
  • If trends continue, AI could autonomously handle week-long tasks within a few years and month-long projects by the end of the decade.
  • The study emphasizes the importance of this metric for AI benchmarks, forecasting, and risk management due to its real-world relevance.
  • Open-source infrastructure and data are provided to encourage further research and replication of findings.
  • Potential implications include both significant benefits and risks, highlighting the need for preparedness in AI development and deployment.