6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Meta has introduced Segment Anything Model 3 (SAM 3), which enhances object detection, segmentation, and tracking in images and videos using text and visual prompts. The release includes model checkpoints, a new playground for experimentation, and applications in platforms like Facebook Marketplace and Instagram's Edits app. SAM 3 also features a data engine that combines AI and human annotators to speed up image and video annotation.
If you do, here's more
Meta has unveiled the Segment Anything Model 3 (SAM 3), a significant upgrade in computer vision that enhances detection, segmentation, and tracking of objects in images and videos using text, exemplar, and visual prompts. SAM 3 introduces features like promptable concept segmentation, allowing users to define objects with open-vocabulary text or exemplar prompts, overcoming limitations of fixed label sets. The model has shown a twofold improvement over existing systems on the new Segment Anything with Concepts (SA-Co) benchmark, which pushes the boundaries of vocabulary recognition for segmentation tasks.
The release includes model checkpoints, evaluation datasets, and fine-tuning code, making it accessible for wider use. Meta also launched the Segment Anything Playground, a platform for users to experiment with SAM and create media modifications. SAM 3โs capabilities will be integrated into various applications, such as Instagram's Edits app, enhancing video creation with new effects targeted at specific objects or people. Moreover, the SAM 3D suite expands functionality to 3D object reconstruction, supporting features like Facebook Marketplace's "View in Room," which helps users visualize home decor items in their own spaces.
Meta tackled the challenge of obtaining high-quality annotated data by developing a new hybrid data engine that combines human and AI annotators. This system speeds up the annotation process, achieving up to five times faster results for negative prompts and 36% faster for positive prompts. The pipeline utilizes AI models to generate captions and initial segmentation masks, which are then verified by human annotators. With over 4 million unique concepts, this approach significantly enhances data quality and coverage, addressing a longstanding issue in creating diverse training sets for AI models.
Questions about this article
No questions yet.