2 links tagged with all of: reinforcement-learning + optimization + machine-learning
Click any tag below to further narrow down your results
Links
The article explains reinforcement learning through a psychological lens, focusing on feedback mechanisms in both humans and computers. It outlines how computer programs learn by receiving scores, updating their responses, and emphasizes a specific approach called Reformist RL, which simplifies implementation for generative models.
TreeRL is a novel reinforcement learning framework that integrates on-policy tree search to enhance the training of language models. By incorporating intermediate supervision and optimizing search efficiency, TreeRL addresses issues common in traditional reinforcement learning methods, such as distribution mismatch and reward hacking. Experimental results show that TreeRL outperforms existing methods in math and code reasoning tasks, showcasing the effectiveness of tree search in this domain.