Quit Emailing Yourself

# reinforcement-learning → exploration → sample-efficiency

1 link tagged with all of: reinforcement-learning + exploration + sample-efficiency

Actor-Critics Can Achieve Optimal Sample Efficiency

A novel actor-critic algorithm is introduced that achieves optimal sample efficiency in reinforcement learning, attaining a sample complexity of \(O(dH^5 \log|\mathcal{A}|/\epsilon^2 + d H^4 \log|\mathcal{F}|/\epsilon^2)\). This algorithm integrates optimism and off-policy critic estimation, and is extended to Hybrid RL, demonstrating efficiency gains when utilizing offline data. Numerical experiments support the theoretical findings of the study.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

reinforcement-learning ✓ + actor-critic sample-efficiency ✓ + hybrid-rl exploration ✓

Links

Actor-Critics Can Achieve Optimal Sample Efficiency