Poetiq shatters ARC-AGI 2 benchmark at half the cost

5 days ago

Copy Link

Poetiq's system has been officially verified to outperform existing methods on the ARC-AGI-2 Semi-Private Test Set, setting a new state-of-the-art.
The system achieved a 54% success rate at $30.57 per problem, surpassing the previous best score of 45% at $77.16 per problem.
Poetiq's meta-system optimizes solutions by leveraging existing frontier models without the need for building or fine-tuning new models.
The meta-system learns from each task it solves, improving over time, and can be integrated into larger AI systems.
Poetiq is exploring the potential to solve long horizon tasks by enhancing knowledge extraction mechanisms without model tuning.
The team consists of 6 experienced researchers and engineers from Google DeepMind, focusing on AI reasoning and knowledge extraction challenges.

Hasty Briefsbeta