This paper introduces a novel method for enhancing visual reasoning that relies on self-improvement and minimizes the number of training samples needed. By utilizing Monte Carlo Tree Search to quantify sample difficulty, the authors effectively filter a large dataset down to 11k challenging samples, leading to significant performance improvements of their model, ThinkLite-VL, over existing models. Evaluation results demonstrate a 7% increase in average performance, achieving state-of-the-art accuracy on several benchmarks.
+ visual-reasoning
monte-carlo-tree-search ✓
data-efficiency ✓
reinforcement-learning ✓
self-improvement ✓