1 link tagged with all of: reinforcement-learning + policy-optimization
Click any tag below to further narrow down your results
Links
This article presents a new framework called Citation-aware Rubric Rewards (CaRR) to improve reinforcement learning for deep search agents. It addresses issues like shortcut exploitation and hallucinations by promoting comprehensive reasoning and evidence-based decision-making. The method outperforms traditional outcome-based approaches in various evaluations.