2 min read
|
Saved October 29, 2025
|
Copied!
Do you care about this?
A novel actor-critic algorithm is introduced that achieves optimal sample efficiency in reinforcement learning, attaining a sample complexity of \(O(dH^5 \log|\mathcal{A}|/\epsilon^2 + d H^4 \log|\mathcal{F}|/\epsilon^2)\). This algorithm integrates optimism and off-policy critic estimation, and is extended to Hybrid RL, demonstrating efficiency gains when utilizing offline data. Numerical experiments support the theoretical findings of the study.
If you do, here's more
Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.
Questions about this article
No questions yet.