HRM Analysis by Arc Prize Organizers

9 days ago

Copy Link

The Hierarchical Reasoning Model (HRM) paper was published on June 8, 2025, by Guan Wang et al., gaining significant attention in the AI community.
HRM, a brain-inspired architecture, achieved 41% on ARC-AGI-1 with only 1,000 training tasks and a 27M parameter model.
Verification tests on the ARC-AGI-1 Semi-Private dataset showed HRM scored 32%, indicating impressive performance despite a slight drop from the claimed 41%.
HRM's architecture includes iterative refinement via 'thinking' bursts, involving a slow planner (H) and fast worker (L) modules that update a shared hidden state.
Key components like the outer refinement loop and data augmentation significantly boost HRM's performance, with the refinement loop being particularly impactful.
Ablation studies revealed that a regular transformer could nearly match HRM's performance, suggesting HRM's architecture isn't the sole driver of its success.
HRM's approach resembles zero-pretraining test-time training, similar to Liao and Gu's 'ARC-AGI without pretraining', focusing on task-specific learning.
Data augmentation is crucial for HRM, but performance plateaus with fewer augmentations, indicating diminishing returns beyond a certain point.
HRM's reliance on puzzle_id embeddings limits its application to seen tasks, posing a challenge for generalization.
Open questions remain about HRM's generalization, the role of puzzle_id embeddings, and potential improvements with few-shot contexts.

Hasty Briefsbeta