Benchmarks in Leipzig
A group of 49 mathematicians conducted a workshop in Leipzig to compile a dataset of research-level mathematics questions. The resulting collection included 100 questions, which were evaluated using various large language models (LLMs). The study demonstrated significant improvements in the mathematical reasoning capabilities of LLMs, with only two questions remaining unsolved after extensive testing.
- ▪The workshop took place at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany, from April 1 to May 15, 2026.
- ▪The mathematicians evaluated the questions in three stages, starting with five state-of-the-art LLMs.
- ▪Initially, 41 questions were unsolved, but this number decreased to only 2 after further evaluations.
Opening excerpt (first ~120 words) tap to expand
Mathematics > History and Overview arXiv:2606.05818 (math) [Submitted on 4 Jun 2026] Title:Benchmarks in Leipzig Authors:Andrei Balakin, Miklós Bóna, Marie-Charlotte Brandenburg, Clara Briand, Veronica Calvo Cortes, Shelby Cox, Jesus A. De Loera, Danai Deligeorgaki, Hannah Friedman, Tim Gehrunger, Chiara Giardino, Stephen Griffeth, Baran Hashemi, Elena Hoster, Alexander Ivanov, Nupur Jain, Aryaman Jal, Leonie Kayser, Joris Koefler, Kevin Kühn, Mario Kummer, Felix Lotter, René Marczinzik, Victor S. Miller, Alejandro Morales, Greta Panova, Gianni Petrella, Nathan Pflueger, Lakshmi Ramesh, Nikolas Rieke, Carlos Rodriguez, Andrea Rosana, Flavio Salizzoni, Otto T.P.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News: Front Page.