ARC-AGI 2025: A research review

a year ago

ARC 2025 is a benchmark for testing 'skill acquisition efficiency', derived from François Chollet's work on measuring intelligence.
The competition requires solving grid-based puzzles by inferring rules from examples and applying them to unseen test grids, emphasizing out-of-domain generalization.
Efficiency is a key component of the challenge, with strict compute bounds to prevent brute-force solutions.
Approaches to solving ARC include discrete program search and deep learning-guided program synthesis, with LLMs becoming significant in 2024.
Test-time adaptation (TTT) is crucial for success in ARC, allowing models to adapt to new puzzles during evaluation.
The 2025 version of ARC introduces harder problems, removing tasks solvable by brute-force and adding new challenges to test generalization.
Core knowledge priors, such as objectness and elementary physics, are foundational to solving ARC puzzles.
Representation of grid data and domain-specific languages (DSLs) are critical for efficient program search and solution generation.
Ensembling different methods, including both inductive and transductive approaches, has proven effective in improving scores.
Recent advancements include the use of 'thinking' models like O3, which leverage in-context learning and reasoning to solve puzzles.

Hasty Briefsbeta