Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

14 hours ago

Copy Link

AI performance can be measured by the length of tasks AI agents can complete, showing exponential growth with a doubling time of around 7 months.
Current AI models excel at text prediction and knowledge tasks but struggle with longer, multi-step projects or substituting human labor.
Human task completion time strongly predicts AI success rates, with current models succeeding in tasks under 4 minutes but failing in tasks over 4 hours.
Historical data shows a consistent exponential increase in the length of tasks AI can complete, suggesting significant future capabilities.
If trends continue, AI could autonomously handle week-long tasks within a few years and month-long projects by the end of the decade.
The study emphasizes the importance of this metric for AI benchmarks, forecasting, and risk management due to its real-world relevance.
Open-source infrastructure and data are provided to encourage further research and replication of findings.
Potential implications include both significant benefits and risks, highlighting the need for preparedness in AI development and deployment.

Hasty Briefsbeta