Click any tag below to further narrow down your results
Links
ShapeR offers a method for generating 3D shapes from image sequences. It processes input images to extract relevant data, then uses a transformer model to create a mesh representation of each object in the scene. The project includes tools for setup, data exploration, and evaluation.
Egocentric-10K is the largest dataset focused on egocentric video collected in real factory settings, featuring over 1 billion frames across nearly 193,000 clips. It includes detailed camera intrinsics and metadata for each video, making it valuable for research in human-robot interaction and computer vision.
DeepMath-103K is a newly released dataset designed to enhance mathematical reasoning in language models, featuring a broad range of challenging and diverse math problems. It includes rigorous decontamination processes to ensure fair evaluation, with detailed problem structures that support various research applications. The accompanying models and code are open-sourced to facilitate further exploration and development in the field.
This article introduces IMAGGarment-1, a framework for generating garments with detailed control over silhouette, color, and logo placement. It features a two-stage training process that separates global appearance from local details, allowing for precise customization in fashion design. The authors also present GarmentBench, a large dataset to support their model, which includes nearly 190,000 garment samples with various design conditions.
StreamBridge is a framework designed to convert offline Video Large Language Models (Video-LLMs) into proactive streaming assistants, addressing issues of multi-turn understanding and proactive response mechanisms. It utilizes a memory buffer and a lightweight activation model for continuous engagement, alongside the creation of the Stream-IT dataset for enhanced streaming video comprehension. Experiments demonstrate that StreamBridge outperforms existing models, showcasing significant improvements in video understanding tasks.
MS MARCO Web Search is a comprehensive dataset designed for information retrieval research, featuring millions of real clicked query-document labels and a vast corpus from ClueWeb22. It supports various tasks in machine learning and retrieval systems, offering a benchmark for evaluating retrieval methods and performance across large datasets. Researchers can utilize this dataset to investigate the effectiveness of their techniques on both small and large data scales.
VistaDPO is a new framework for optimizing video understanding in Large Video Models (LVMs) by aligning text-video preferences at three hierarchical levels: instance, temporal, and perceptive. The authors introduce a dataset, VistaDPO-7k, consisting of 7.2K annotated QA pairs to address the challenges of video-language misalignment and hallucinations, showing significant performance improvements in various benchmarks.
REGEN introduces a new benchmark dataset aimed at enhancing the capabilities of large language models (LLMs) in generating personalized recommendations through natural language interactions. By augmenting the Amazon Product Reviews dataset with user critiques and contextual narratives, REGEN allows for more nuanced conversational recommendations that adapt to user feedback. The study demonstrates how models like LUMEN can effectively integrate recommendation and narrative generation, paving the way for more intuitive user experiences.