At a secret Berkeley summit, OpenAI’s o4-mini AI stunned top mathematicians by solving complex, Ph.D.-level problems with unprecedented speed and creativity—ushering in a new era of collaboration (and concern) between humans and reasoning machines. (Source: Image by RR)

Concerns Grow Over Trust, Confidence, and the Future of Math Research

A recent secret gathering of 30 top mathematicians in Berkeley, California, revealed a startling leap in AI capabilities. During a two-day event, the group pitted their most challenging mathematical problems against OpenAI’s latest reasoning model, o4-mini. Designed for high-level deduction, this lightweight yet nimble large language model stunned researchers by solving complex, Ph.D.-level math questions. Ken Ono, a leader at the event, likened its performance to that of a “mathematical genius,” noting the bot’s ability to reason, self-correct and display creative problem-solving in real time.

The o4-mini’s performance is the product of a focused training process and fine-tuned reinforcement with human feedback. Unlike earlier models that struggled with novel mathematical challenges, o4-mini was able to solve up to 20% of questions from a new benchmark called FrontierMath, developed by nonprofit group Epoch AI. These problems, as noted in scientificamerican.com, were tiered in difficulty, ranging from undergraduate to expert-level research. For the toughest challenges, mathematicians had to use secure messaging apps like Signal to prevent their questions from leaking into training data—underscoring the seriousness and secrecy of the test.

During the in-person showdown in May 2025, participants were rewarded $7,500 for every question that stumped the bot. Yet by Saturday night, even carefully crafted questions by experts were falling. Ono recounted watching the AI read research literature, test ideas on simpler versions of a problem, and finally deliver a clever and correct solution—adding a touch of sass at the end. The incident shocked participants, revealing a model that not only reasoned like a top-tier graduate student but completed tasks at superhuman speed.

Despite the excitement, many experts at the event expressed unease. Some feared the growing authority with which models like o4-mini deliver answers could lead to misplaced trust or overreliance. The group began speculating on a potential “tier five”—problems no human can solve, leaving AI as the sole explorer of future mathematical frontiers. Ono and others believe this shift could reframe the role of mathematicians from problem-solvers to creative collaborators, guiding AI in the search for truth. As Ono put it, “These large language models are already outperforming most of our best graduate students in the world.”

read more at scientificamerican.com