1 link tagged with all of: evaluation + benchmarks + gui-agents
Click any tag below to further narrow down your results
Links
ScreenSuite is introduced as the most comprehensive evaluation suite for GUI agents, designed to benchmark vision language models (VLMs) across various capabilities such as perception, grounding, and multi-step actions. It provides a modular and vision-only framework for evaluating GUI agents in realistic scenarios, allowing for easier integration and reproducibility in AI research.