EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
EvoTrainer is a new autonomous training framework designed for co-evolving LLM policies and training harnesses. It addresses limitations in traditional LLM training by utilizing empirical feedback to enhance performance in various domains. The framework has shown to match or exceed human-engineered reinforcement learning references in tasks such as mathematical reasoning and software engineering.
- ▪EvoTrainer co-evolves LLM policies and training harnesses through empirical feedback.
- ▪The framework has been evaluated on tasks like mathematical reasoning and competitive programming.
- ▪EvoTrainer outperforms traditional methods in long-horizon agentic software engineering.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.03108 (cs) [Submitted on 2 Jun 2026] Title:EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning Authors:Guhong Chen, Yingcheng Shi, Yongbin Li, Binhua Li, Xander Xu, Hu Wei, Shiwen Ni, Min Yang, Jieping Ye View a PDF of the paper titled EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning, by Guhong Chen and 8 other authors View PDF HTML (experimental) Abstract:Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.