Measuring AI agent autonomy in practice

6 days ago

AI agents are being deployed across various contexts, from email triage to cyber espionage, but understanding their autonomy in practice is still limited.
Analysis of millions of human-agent interactions shows that Claude Code operates autonomously for longer periods, with session durations nearly doubling in three months.
Experienced users in Claude Code auto-approve actions more frequently but also interrupt more often, indicating a shift in oversight strategy.
Claude Code pauses for clarification more often than humans interrupt it, especially in complex tasks, suggesting built-in safety mechanisms.
Agents are used in risky domains like healthcare, finance, and cybersecurity, but most actions remain low-risk and reversible, with software engineering dominating usage.
Effective oversight of AI agents requires new monitoring infrastructure and human-AI interaction paradigms to manage autonomy and risk collaboratively.
The study highlights the need for post-deployment monitoring, training models to recognize uncertainty, and designing products for user oversight without mandating specific interaction patterns.

Hasty Briefsbeta