Webwright: A terminal is all you need for web agents
Webwright introduces a terminal-native approach for web agents, allowing them to manage multiple browser sessions efficiently. This innovative model enables agents to create reusable programs from web tasks while maintaining a local workspace for logs and outputs. The system demonstrates significant improvements in long-horizon browsing accuracy compared to traditional methods.
- ▪Webwright allows agents to launch, inspect, and discard browser sessions while preserving outputs in a local workspace.
- ▪The model achieves 86.7% accuracy on the Online-Mind2Web benchmark and a 60.8% score on the Odysseys long-horizon browsing test.
- ▪Webwright's implementation consists of a minimal harness with three core modules totaling around 1,000 lines of code.
Opening excerpt (first ~120 words) tap to expand
Terminal-native web agents A terminal is all you need for web agents. Webwright gives the model a terminal, a local workspace, and the freedom to write code that launches, inspects, and discards browser sessions. The output is not just a completed task, but a reusable program. How it works Watch the trace GitHub Microsoft Research Blog 3 core modules ~1K lines of harness code 86.7% Online-Mind2Web accuracy 60.8% Odysseys score Paradigm shift In Webwright, agent can launch multiple browser sessions in terminal. Traditional web agents keep one browser session alive and predict the next click, type, or scroll. Webwright separates the agent from that session: the browser can be launched, inspected, and discarded, while code, logs, screenshots, and outputs persist in the local workspace.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.