The latest AI scaling graph – and why it hardly makes sense

a year ago

METR published a study on AI performance in software-related tasks, leading to a viral graph.
The graph's y-axis measures AI performance based on human time to solve tasks, which is criticized as arbitrary and flawed.
METR's technical report was careful, but social media posts exaggerated the findings beyond the study's scope.
The dataset of software tasks was well-constructed but may not generalize to other cognitive domains.
Extrapolating AI capabilities from the graph is seen as misguided, with exponential growth assumptions being unreliable.
Confirmation bias and hype are more prevalent among investors than builders in the AI field.

Hasty Briefsbeta