FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
The paper introduces FrontierOR, a benchmark designed to evaluate the capacity of large language models (LLMs) in efficient algorithm design for large-scale optimization problems. It highlights the limitations of existing benchmarks and presents a systematic evaluation of seven LLMs across various tasks. The findings indicate that while some models perform well, they still struggle to transition from formulation to effective optimization algorithms.
- ▪FrontierOR includes 180 tasks derived from diverse operations research papers.
- ▪The strongest one-shot model outperformed Gurobi in only 31% of cases.
- ▪Even strong coding agents achieved only 50% success on selected hard tasks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.25246 (cs) [Submitted on 24 May 2026] Title:FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization Authors:Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghuan Wang, Jinglong Zhao, Hanzhang Qin, Cathy Wu, Paul Pu Liang, Jinhua Zhao, Hai Wang View a PDF of the paper titled FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization, by Minwei Kong and 26 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly used…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.