Quit Emailing Yourself

# dataset → machine-learning

8 links tagged with all of: dataset + machine-learning

Click any tag below to further narrow down your results

Links

GitHub - facebookresearch/ShapeR: Code for the ShapeR research paper

ShapeR offers a method for generating 3D shapes from image sequences. It processes input images to extract relevant data, then uses a transformer model to create a mesh representation of each object in the scene. The project includes tools for setup, data exploration, and evaluation.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ 3d-modeling + shape-generation + computer-vision machine-learning ✓ dataset ✓

builddotai/Egocentric-10K · Datasets at Hugging Face

Egocentric-10K is the largest dataset focused on egocentric video collected in real factory settings, featuring over 1 billion frames across nearly 193,000 clips. It includes detailed camera intrinsics and metadata for each video, making it valuable for research in human-robot interaction and computer vision.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

dataset ✓ + egocentric + video + factory machine-learning ✓

GitHub - zwhe99/DeepMath: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

DeepMath-103K is a newly released dataset designed to enhance mathematical reasoning in language models, featuring a broad range of challenging and diverse math problems. It includes rigorous decontamination processes to ensure fair evaluation, with detailed problem structures that support various research applications. The accompanying models and code are open-sourced to facilitate further exploration and development in the field.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ deepmath dataset ✓ + mathematics machine-learning ✓ + open-source

IMAGGarment-1: A Framework for Fine-Grained Garment Generation with Enhanced Customization

This article introduces IMAGGarment-1, a framework for generating garments with detailed control over silhouette, color, and logo placement. It features a two-stage training process that separates global appearance from local details, allowing for precise customization in fashion design. The authors also present GarmentBench, a large dataset to support their model, which includes nearly 190,000 garment samples with various design conditions.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ garment-generation + fashion-design dataset ✓ machine-learning ✓ + controllability

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

StreamBridge is a framework designed to convert offline Video Large Language Models (Video-LLMs) into proactive streaming assistants, addressing issues of multi-turn understanding and proactive response mechanisms. It utilizes a memory buffer and a lightweight activation model for continuous engagement, alongside the creation of the Stream-IT dataset for enhanced streaming video comprehension. Experiments demonstrate that StreamBridge outperforms existing models, showcasing significant improvements in video understanding tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ video-llms + proactive-assistant + streaming machine-learning ✓ dataset ✓

GitHub - microsoft/MS-MARCO-Web-Search: A large-scale information-rich web dataset, featuring millions of real clicked query-document labels

MS MARCO Web Search is a comprehensive dataset designed for information retrieval research, featuring millions of real clicked query-document labels and a vast corpus from ClueWeb22. It supports various tasks in machine learning and retrieval systems, offering a benchmark for evaluating retrieval methods and performance across large datasets. Researchers can utilize this dataset to investigate the effectiveness of their techniques on both small and large data scales.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ ms-marco + web-search dataset ✓ + retrieval machine-learning ✓

GitHub - HaroldChen19/VistaDPO: [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

VistaDPO is a new framework for optimizing video understanding in Large Video Models (LVMs) by aligning text-video preferences at three hierarchical levels: instance, temporal, and perceptive. The authors introduce a dataset, VistaDPO-7k, consisting of 7.2K annotated QA pairs to address the challenges of video-language misalignment and hallucinations, showing significant performance improvements in various benchmarks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ video-understanding machine-learning ✓ dataset ✓ + optimization + ai-research

REGEN: Empowering personalized recommendations with natural language

REGEN introduces a new benchmark dataset aimed at enhancing the capabilities of large language models (LLMs) in generating personalized recommendations through natural language interactions. By augmenting the Amazon Product Reviews dataset with user critiques and contextual narratives, REGEN allows for more nuanced conversational recommendations that adapt to user feedback. The study demonstrates how models like LUMEN can effectively integrate recommendation and narrative generation, paving the way for more intuitive user experiences.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ natural-language + recommendations dataset ✓ machine-learning ✓ + conversational-systems