Click any tag below to further narrow down your results
Links
This article outlines a method for training judges for Vision-Language Models (VLMs) without human annotations. The approach uses self-synthesized data in an iterative process to improve judgment accuracy, resulting in notable performance gains on various evaluation benchmarks.