ARC-AGI-3

4 hours ago

ARC-AGI-3 is an interactive reasoning benchmark for measuring human-like intelligence in AI agents.
It challenges AI agents to explore novel environments, acquire goals dynamically, and adapt strategies without relying on pre-loaded knowledge or natural-language instructions.
A 100% score indicates AI agents can solve tasks as efficiently as humans.
Key intelligence metrics include skill-acquisition efficiency, long-horizon planning, and experience-driven adaptation.
The benchmark makes the gap between AI and human learning measurable by testing intelligence across time, not just final answers.
Design principles emphasize ease of human use, clear goals, meaningful feedback, and novelty to prevent memorization.
Features include replayable runs, a developer toolkit for agent integration, and a transparent evaluation UI.
The toolkit supports agent integration, testing, and iteration through an interactive UI.

Hasty Briefsbeta