ARC-AGI-2 human baseline surpassed
a day ago
- #Human Performance
- #AI Benchmarking
- #ARC-AGI-2
- The human baseline for ARC-AGI-2 is low, possibly due to tasks being either well-posed but difficult or ill-posed with multiple valid solutions.
- Some tasks in ARC-AGI-2 were not solved by any participants, indicating they may be too hard or require specific experiences to solve.
- The reported 100% human score for ARC-AGI-2 means every task was solved by at least two humans, not that any single human solved all tasks.
- Average human performance on ARC-AGI-2 tasks was around 53%, with AI systems like GPT-5.2 and Gemini 3 Pro surpassing this.
- Human participants were diverse in background and paid $5 per task plus a show-up fee, while AI systems achieved better cost-efficiency.
- ARC-AGI-2 tasks require more deliberate thinking compared to ARC-AGI-1, with an average completion time of 2.7 minutes per task.