SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking
The paper introduces SimuWoB, a synthetic benchmark designed for evaluating mobile GUI agents. It addresses the limitations of existing benchmarks by providing a diverse set of challenging tasks and environments. The study reveals significant weaknesses in current agents, particularly in complex scenarios, highlighting the need for improved development in this field.
- ▪SimuWoB includes 120 challenging tasks that vary in type and difficulty for mobile GUI agents.
- ▪The average success rate of state-of-the-art mobile GUI agents was found to be only 27.92%.
- ▪The success rate dropped to 17.82% on long-horizon tasks, indicating substantial weaknesses in handling complex interactions.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.25160 (cs) [Submitted on 24 May 2026] Title:SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking Authors:Guohong Liu, Jialei Ye, Pengzhi Gao, Wei Liu, Jian Luan, Yunxin Liu, Yuanchun Li View a PDF of the paper titled SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking, by Guohong Liu and 6 other authors View PDF HTML (experimental) Abstract:Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.