More on the topic...
Generating detailed summary...
Failed to generate summary. Please try again.
Meta has unveiled the Segment Anything Model 3 (SAM 3), a significant upgrade in computer vision that enhances detection, segmentation, and tracking of objects in images and videos using text, exemplar, and visual prompts. SAM 3 introduces features like promptable concept segmentation, allowing users to define objects with open-vocabulary text or exemplar prompts, overcoming limitations of fixed label sets. The model has shown a twofold improvement over existing systems on the new Segment Anything with Concepts (SA-Co) benchmark, which pushes the boundaries of vocabulary recognition for segmentation tasks.
The release includes model checkpoints, evaluation datasets, and fine-tuning code, making it accessible for wider use. Meta also launched the Segment Anything Playground, a platform for users to experiment with SAM and create media modifications. SAM 3’s capabilities will be integrated into various applications, such as Instagram's Edits app, enhancing video creation with new effects targeted at specific objects or people. Moreover, the SAM 3D suite expands functionality to 3D object reconstruction, supporting features like Facebook Marketplace's "View in Room," which helps users visualize home decor items in their own spaces.
Meta tackled the challenge of obtaining high-quality annotated data by developing a new hybrid data engine that combines human and AI annotators. This system speeds up the annotation process, achieving up to five times faster results for negative prompts and 36% faster for positive prompts. The pipeline utilizes AI models to generate captions and initial segmentation masks, which are then verified by human annotators. With over 4 million unique concepts, this approach significantly enhances data quality and coverage, addressing a longstanding issue in creating diverse training sets for AI models.
Questions about this article
No questions yet.