The Secret Meeting Where Mathematicians Struggled to Outsmart AI

a year ago

A clandestine mathematical conclave was held in mid-May in Berkeley, Calif., where 30 renowned mathematicians tested a reasoning chatbot named o4-mini.
The chatbot, powered by OpenAI's reasoning large language model (LLM), demonstrated the ability to solve some of the world's hardest mathematical problems, surprising the mathematicians.
o4-mini and similar models like Google's Gemini 2.5 Flash are lighter-weight and more nimble, trained on specialized datasets with strong human reinforcement.
Epoch AI benchmarked o4-mini with 300 unpublished math questions, finding traditional LLMs solved less than 2%, while o4-mini solved around 20%.
A fourth tier of 100 highly challenging questions was introduced, with mathematicians signing NDAs to prevent dataset contamination.
During a two-day meeting, mathematicians competed to devise problems that would stump o4-mini, with a $7,500 reward for each unsolved problem.
o4-mini solved an open question in number theory in real-time, displaying advanced reasoning and even a cheeky attitude.
Mathematicians were astonished by the AI's progress, likening it to a 'strong collaborator' and noting its speed compared to human experts.
Concerns were raised about over-reliance on o4-mini's results, with fears it could 'master proof by intimidation' due to its confidence.
Discussions turned to the future role of mathematicians, potentially shifting to posing questions and interacting with AI to discover new truths.

Hasty Briefsbeta