2 links tagged with all of: reinforcement-learning + self-improvement
Click any tag below to further narrow down your results
Links
This article explores a method called SOAR, where a pre-trained model generates synthetic problems to help another model learn better. It emphasizes the importance of creating effective learning tasks rather than focusing solely on problem-solving accuracy. The findings suggest that this self-improvement approach can help models overcome learning difficulties without needing more curated data.
This paper introduces a novel method for enhancing visual reasoning that relies on self-improvement and minimizes the number of training samples needed. By utilizing Monte Carlo Tree Search to quantify sample difficulty, the authors effectively filter a large dataset down to 11k challenging samples, leading to significant performance improvements of their model, ThinkLite-VL, over existing models. Evaluation results demonstrate a 7% increase in average performance, achieving state-of-the-art accuracy on several benchmarks.
+ visual-reasoning
+ monte-carlo-tree-search
+ data-efficiency
reinforcement-learning ✓
self-improvement ✓