ARC-AGI-2 human baseline surpassed

a day ago

Copy Link

The human baseline for ARC-AGI-2 is low, possibly due to tasks being either well-posed but difficult or ill-posed with multiple valid solutions.
Some tasks in ARC-AGI-2 were not solved by any participants, indicating they may be too hard or require specific experiences to solve.
The reported 100% human score for ARC-AGI-2 means every task was solved by at least two humans, not that any single human solved all tasks.
Average human performance on ARC-AGI-2 tasks was around 53%, with AI systems like GPT-5.2 and Gemini 3 Pro surpassing this.
Human participants were diverse in background and paid $5 per task plus a show-up fee, while AI systems achieved better cost-efficiency.
ARC-AGI-2 tasks require more deliberate thinking compared to ARC-AGI-1, with an average completion time of 2.7 minutes per task.

Hasty Briefsbeta