Quit Emailing Yourself

GitHub - HaroldChen19/VistaDPO: [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

2 min read | Saved October 29, 2025 | Copied!

video-understanding 🤖 machine-learning 🤖 dataset 🤖 optimization 🤖 ai-research 🤖

Do you care about this?

VistaDPO is a new framework for optimizing video understanding in Large Video Models (LVMs) by aligning text-video preferences at three hierarchical levels: instance, temporal, and perceptive. The authors introduce a dataset, VistaDPO-7k, consisting of 7.2K annotated QA pairs to address the challenges of video-language misalignment and hallucinations, showing significant performance improvements in various benchmarks.

If you do, here's more

Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.

Questions about this article

No questions yet.