Quit Emailing Yourself

# ai-research → machine-learning → video-understanding

1 link tagged with all of: ai-research + machine-learning + video-understanding

Click any tag below to further narrow down your results

Links

GitHub - HaroldChen19/VistaDPO: [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

VistaDPO is a new framework for optimizing video understanding in Large Video Models (LVMs) by aligning text-video preferences at three hierarchical levels: instance, temporal, and perceptive. The authors introduce a dataset, VistaDPO-7k, consisting of 7.2K annotated QA pairs to address the challenges of video-language misalignment and hallucinations, showing significant performance improvements in various benchmarks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

video-understanding ✓ machine-learning ✓ + dataset + optimization ai-research ✓