What Makes Interaction Trajectories Effective for Training Terminal Agents?
The paper investigates the effectiveness of interaction trajectories in training terminal agents. It reveals that higher performance in standalone agents does not necessarily correlate with better teaching outcomes. The study emphasizes the importance of environment-grounded supervision in enhancing the generalization capabilities of agents.
- ▪The research highlights a 'pedagogical paradox' where lower-scoring agents can provide better teaching than higher-scoring ones.
- ▪Environment-Grounded Supervision (EGS) is identified as a key factor that helps students internalize robust problem-solving routines.
- ▪The findings suggest that the future of agent post-training should focus on the systematic design of interaction structures rather than just outcome-matching.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.03461 (cs) [Submitted on 2 Jun 2026] Title:What Makes Interaction Trajectories Effective for Training Terminal Agents? Authors:Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, Lifeng Shang, Xiaohui Li, Ngai Wong, Haoli Bai View a PDF of the paper titled What Makes Interaction Trajectories Effective for Training Terminal Agents?, by Sidi Yang and 13 other authors View PDF HTML (experimental) Abstract:Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.