ScreenSuite is introduced as the most comprehensive evaluation suite for GUI agents, designed to benchmark vision language models (VLMs) across various capabilities such as perception, grounding, and multi-step actions. It provides a modular and vision-only framework for evaluating GUI agents in realistic scenarios, allowing for easier integration and reproducibility in AI research.