2 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article introduces WebGym, an extensive open-source environment for training visual web agents using nearly 300,000 tasks from real websites. It details a reinforcement learning approach that improves agent performance, achieving a notable increase in success rates on unseen tasks compared to other models.
If you do, here's more
WebGym is a new open-source environment designed for training visual web agents with realistic tasks. It stands out for its scale, featuring nearly 300,000 tasks sourced from real websites, which helps address the limitations of smaller, artificial task sets. The tasks vary in difficulty and are evaluated based on a rubric, ensuring a comprehensive training experience. The authors implemented a straightforward reinforcement learning (RL) approach where agents learn from their own interaction traces, using task rewards to refine their performance.
To enhance efficiency, the creators of WebGym developed a high-throughput asynchronous rollout system, achieving a rollout speedup of 4 to 5 times compared to traditional methods. This improvement allows for faster sampling of trajectories, which is essential for scaling the training process. The breadth, depth, and size of the task set contribute to further performance gains. A notable achievement was fine-tuning the Qwen-3-VL-8B-Instruct model on this extensive dataset, which led to an increase in success rates on out-of-distribution tasks from 26.2% to 42.9%. This performance surpasses proprietary models like GPT-4o and GPT-5-Thinking, which achieved 27.1% and 29.8% success rates, respectively. The out-of-distribution test set consisted of tasks from websites not encountered during training, highlighting the robustness of the WebGym framework.
Questions about this article
No questions yet.