WeSearch

DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration

·3 min read · 0 reactions · 0 comments · 5 views
#artificial intelligence#desktop agents#human collaboration
DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration
⚡ TL;DR · AI summary

The paper titled 'DeskCraft' introduces a new benchmark for evaluating desktop agents in professional workflows that require human collaboration. It addresses the limitations of existing benchmarks by focusing on long horizon tasks and proactive human-agent interactions. The study evaluates various agents and highlights persistent challenges in delivering complex workflows and clarifying tasks.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2606.03103 (cs) [Submitted on 2 Jun 2026] Title:DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration Authors:Wenkai Wang, Tao Xiong, Jingchen Ni, Yunpeng Bao, Xiyun Li, Tianqi Liu, Hongcan Guo, Zilong Huang, Shengyu Zhang View a PDF of the paper titled DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration, by Wenkai Wang and 8 other authors View PDF HTML (experimental) Abstract:Real-world professional desktop workflows in specialized creative and engineering software unfold over long horizons and often require human-in-the-loop coordination, where agents proactively seek necessary information and users provide additional instructions,…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI