Hasty Briefsbeta

I got the highest score on ARC-AGI again swapping Python for English

11 hours ago
  • #AI
  • #Machine Learning
  • #ARC-AGI
  • ARC-AGI is a benchmark for abstract pattern recognition, highlighting the gap between human and AI performance.
  • The author achieved a new high score of 79.6% on ARC v1 and 29.4% on ARC v2 using Evolutionary Test-Time Compute with English instructions.
  • The method involves generating and refining natural language instructions through evolutionary cycles, replacing Python functions.
  • ARC-AGI v2 tasks are more complex, requiring multi-step reasoning, yet remain solvable by humans with high accuracy.
  • Current LLMs struggle with 'dead reasoning zones,' where logic fails inconsistently across domains.
  • The author suggests that reinforcement learning (RL) can help models develop consistent, transferable reasoning skills.
  • AGI, as defined by François Chollet, requires efficient skill acquisition outside training data, a goal not yet met by LLMs.