Quit Emailing Yourself

# reinforcement-learning → policy-optimization

1 link tagged with all of: reinforcement-learning + policy-optimization

Click any tag below to further narrow down your results

Links

GitHub - THUDM/CaRR: This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".

This article presents a new framework called Citation-aware Rubric Rewards (CaRR) to improve reinforcement learning for deep search agents. It addresses issues like shortcut exploitation and hallucinations by promoting comprehensive reasoning and evidence-based decision-making. The method outperforms traditional outcome-based approaches in various evaluations.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

reinforcement-learning ✓ + deep-search + rubric-rewards + evidence-grounding policy-optimization ✓