VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
VitaBench 2.0 is a new benchmark designed to evaluate personalized and proactive agents in long-term user interactions. It addresses the limitations of existing benchmarks by focusing on user preferences and the challenges of real-world decision-making. The study reveals significant gaps in current models' capabilities to effectively personalize interactions based on fragmented user data.
- ▪VitaBench 2.0 introduces a framework for assessing agent behavior in long-term user interactions.
- ▪The benchmark emphasizes the importance of understanding user preferences from fragmented interactions.
- ▪Results indicate that even state-of-the-art models struggle with real-world personalization challenges.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.27141 (cs) [Submitted on 26 May 2026] Title:VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions Authors:Yuxin Chen, Yi Zhang, Zhengzhou Cai, Yaorui Shi, Zhiyuan Yao, Chenhang Cui, Jingnan Zheng, Yaqi Huo, Xi Su, Qi Gu, Xunliang Cai, Xiang Wang, An Zhang, Tat-Seng Chua View a PDF of the paper titled VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions, by Yuxin Chen and 13 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.