VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
The article introduces VISTA, a benchmark designed to evaluate the capabilities of LLM-based agents in generating web applications from visual specifications. Unlike previous benchmarks that focused on algorithmic tasks, VISTA emphasizes realistic UI-centric development. The benchmark includes various prompt conditions and aims to provide a robust evaluation framework for agent-based software engineering research.
- ▪VISTA targets realistic UI-centric development for web-app generation.
- ▪It defines five prompt-information conditions varying in visual and structural fidelity.
- ▪The evaluation combines DOM-grounded reference matching and behavior-specific browser tests.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Software Engineering arXiv:2605.26144 (cs) [Submitted on 22 May 2026] Title:VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents Authors:JunJia Guo, Yuhang Yao, Jiawei (Joe)Zhou, Jingdi Chen View a PDF of the paper titled VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents, by JunJia Guo and 3 other authors View PDF HTML (experimental) Abstract:We present VISTA (VIsual Spec-To-App Benchmark), a benchmark for evaluating the end-to-end web-app generation capabilities of LLM-based agents. Unlike prior code generation benchmarks that focus on algorithmic tasks, VISTA targets realistic UI-centric development, where agents must produce functional, visually coherent applications from underspecified inputs.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.