Benchmarks in Leipzig

3 hours ago

A group of 49 mathematicians created a dataset of 100 research-level mathematics questions with known answers between April 1 and May 15, 2026.
The work primarily took place during a 3-day workshop in Leipzig, Germany, with 35 participants at the Max Planck Institute for Mathematics in the Sciences.
The questions were evaluated in three stages using state-of-the-art LLMs, with the number of unsolved questions dropping from 41 after Stage 1 to 16 after Stage 2, and finally to only 2 after Stage 3.
The results demonstrate that the mathematical reasoning capabilities of large language models (LLMs) are becoming impressively advanced.

Hasty Briefsbeta