This paper introduces a novel method for enhancing visual reasoning that relies on self-improvement and minimizes the number of training samples needed. By utilizing Monte Carlo Tree Search to quantify sample difficulty, the authors effectively filter a large dataset down to 11k challenging samples, leading to significant performance improvements of their model, ThinkLite-VL, over existing models. Evaluation results demonstrate a 7% increase in average performance, achieving state-of-the-art accuracy on several benchmarks.
Thyme introduces a groundbreaking approach to image processing by autonomously generating and executing code for complex visual reasoning tasks. Utilizing a two-stage training strategy that combines supervised fine-tuning and reinforcement learning, along with the innovative GRPO-ATS algorithm, it effectively enhances performance in high-resolution perception.