ARC-AGI 2025: A research review
10 months ago
- #Artificial General Intelligence
- #Machine Learning
- #Program Synthesis
- ARC 2025 is a benchmark for testing 'skill acquisition efficiency', derived from François Chollet's work on measuring intelligence.
- The competition requires solving grid-based puzzles by inferring rules from examples and applying them to unseen test grids, emphasizing out-of-domain generalization.
- Efficiency is a key component of the challenge, with strict compute bounds to prevent brute-force solutions.
- Approaches to solving ARC include discrete program search and deep learning-guided program synthesis, with LLMs becoming significant in 2024.
- Test-time adaptation (TTT) is crucial for success in ARC, allowing models to adapt to new puzzles during evaluation.
- The 2025 version of ARC introduces harder problems, removing tasks solvable by brute-force and adding new challenges to test generalization.
- Core knowledge priors, such as objectness and elementary physics, are foundational to solving ARC puzzles.
- Representation of grid data and domain-specific languages (DSLs) are critical for efficient program search and solution generation.
- Ensembling different methods, including both inductive and transductive approaches, has proven effective in improving scores.
- Recent advancements include the use of 'thinking' models like O3, which leverage in-context learning and reasoning to solve puzzles.