HRM Analysis by Arc Prize Organizers
9 days ago
- #Hierarchical Reasoning
- #AI Research
- #ARC-AGI Benchmark
- The Hierarchical Reasoning Model (HRM) paper was published on June 8, 2025, by Guan Wang et al., gaining significant attention in the AI community.
- HRM, a brain-inspired architecture, achieved 41% on ARC-AGI-1 with only 1,000 training tasks and a 27M parameter model.
- Verification tests on the ARC-AGI-1 Semi-Private dataset showed HRM scored 32%, indicating impressive performance despite a slight drop from the claimed 41%.
- HRM's architecture includes iterative refinement via 'thinking' bursts, involving a slow planner (H) and fast worker (L) modules that update a shared hidden state.
- Key components like the outer refinement loop and data augmentation significantly boost HRM's performance, with the refinement loop being particularly impactful.
- Ablation studies revealed that a regular transformer could nearly match HRM's performance, suggesting HRM's architecture isn't the sole driver of its success.
- HRM's approach resembles zero-pretraining test-time training, similar to Liao and Gu's 'ARC-AGI without pretraining', focusing on task-specific learning.
- Data augmentation is crucial for HRM, but performance plateaus with fewer augmentations, indicating diminishing returns beyond a certain point.
- HRM's reliance on puzzle_id embeddings limits its application to seen tasks, posing a challenge for generalization.
- Open questions remain about HRM's generalization, the role of puzzle_id embeddings, and potential improvements with few-shot contexts.