Quit Emailing Yourself

# evaluation → machine-learning → reinforcement-learning

3 links tagged with all of: evaluation + machine-learning + reinforcement-learning

Click any tag below to further narrow down your results

Links

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

This article introduces WebGym, an extensive open-source environment for training visual web agents using nearly 300,000 tasks from real websites. It details a reinforcement learning approach that improves agent performance, achieving a notable increase in success rates on unseen tasks compared to other models.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ web-agents reinforcement-learning ✓ machine-learning ✓ + tasks evaluation ✓

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

WavReward is a novel reward feedback model designed to evaluate spoken dialogue systems by assessing both their intelligence quotient (IQ) and emotional quotient (EQ) through audio language models. It introduces a specialized evaluator using multi-sample feedback and reinforcement learning, along with the ChatReward-30K dataset, significantly outperforming existing evaluation models in accuracy and subjective testing across various spoken dialogue scenarios.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ spoken-dialogue evaluation ✓ + audio-models reinforcement-learning ✓ machine-learning ✓

JudgeLRM: Large Reasoning Models as a Judge

JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ large-language-models + reasoning reinforcement-learning ✓ evaluation ✓ machine-learning ✓