Hasty Briefsbeta

  • #Human Performance
  • #AI Benchmarking
  • #ARC-AGI-2
  • The human baseline for ARC-AGI-2 is low, possibly due to tasks being either well-posed but difficult or ill-posed with multiple valid solutions.
  • Some tasks in ARC-AGI-2 were not solved by any participants, indicating they may be too hard or require specific experiences to solve.
  • The reported 100% human score for ARC-AGI-2 means every task was solved by at least two humans, not that any single human solved all tasks.
  • Average human performance on ARC-AGI-2 tasks was around 53%, with AI systems like GPT-5.2 and Gemini 3 Pro surpassing this.
  • Human participants were diverse in background and paid $5 per task plus a show-up fee, while AI systems achieved better cost-efficiency.
  • ARC-AGI-2 tasks require more deliberate thinking compared to ARC-AGI-1, with an average completion time of 2.7 minutes per task.