Quit Emailing Yourself

# machine-learning → dataset → ai-research → video-understanding

1 link tagged with all of: machine-learning + dataset + ai-research + video-understanding

GitHub - HaroldChen19/VistaDPO: [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

VistaDPO is a new framework for optimizing video understanding in Large Video Models (LVMs) by aligning text-video preferences at three hierarchical levels: instance, temporal, and perceptive. The authors introduce a dataset, VistaDPO-7k, consisting of 7.2K annotated QA pairs to address the challenges of video-language misalignment and hallucinations, showing significant performance improvements in various benchmarks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

video-understanding ✓ machine-learning ✓ dataset ✓ + optimization ai-research ✓

Links

GitHub - HaroldChen19/VistaDPO: [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models