Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M
14 hours ago
- #AI Performance
- #Exponential Growth
- #Task Automation
- AI performance can be measured by the length of tasks AI agents can complete, showing exponential growth with a doubling time of around 7 months.
- Current AI models excel at text prediction and knowledge tasks but struggle with longer, multi-step projects or substituting human labor.
- Human task completion time strongly predicts AI success rates, with current models succeeding in tasks under 4 minutes but failing in tasks over 4 hours.
- Historical data shows a consistent exponential increase in the length of tasks AI can complete, suggesting significant future capabilities.
- If trends continue, AI could autonomously handle week-long tasks within a few years and month-long projects by the end of the decade.
- The study emphasizes the importance of this metric for AI benchmarks, forecasting, and risk management due to its real-world relevance.
- Open-source infrastructure and data are provided to encourage further research and replication of findings.
- Potential implications include both significant benefits and risks, highlighting the need for preparedness in AI development and deployment.