2 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Egocentric-10K is the largest dataset focused on egocentric video collected in real factory settings, featuring over 1 billion frames across nearly 193,000 clips. It includes detailed camera intrinsics and metadata for each video, making it valuable for research in human-robot interaction and computer vision.
If you do, here's more
Egocentric-10K is a significant dataset in the field of computer vision, claiming the title of the largest egocentric dataset available. It focuses on real-world factory environments, marking a departure from previous datasets collected in more controlled settings. The dataset includes 1.08 billion frames captured over 10,000 hours, spread across 192,900 video clips. Each clip lasts an average of 180 seconds, providing a rich resource for analyzing human activity and object interaction in industrial contexts.
The dataset features a high level of detail, with camera specifications including a resolution of 1080p and a frame rate of 30 frames per second. It utilizes a monocular head-mounted camera setup and records without audio, focusing purely on visual data. Each worker's folder contains a JSON file detailing calibrated camera parameters, ensuring that users can accurately interpret the footage. The structured format allows easy access to both video and accompanying metadata, including factory and worker identifiers, video duration, and file size.
Users can load the dataset through a simple API, enabling them to stream specific factories or workers as needed. This flexibility supports targeted analysis, which is essential for researchers focused on specific aspects of egocentric vision. The dataset is licensed under the Apache 2.0 License, making it accessible for various applications in machine learning and computer vision projects. With over 9,000 downloads in the last month, Egocentric-10K is already gaining traction among researchers.
Questions about this article
No questions yet.