4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
SAM Audio is a new tool from Meta that lets users isolate sounds from audio and video sources using text, visual, or temporal prompts. It can separate general sounds, music, and speech with high accuracy, allowing for clearer audio experiences. This model also includes an open-source evaluation dataset for audio separation.
If you do, here's more
Meta has introduced the Segment Anything Model Audio (SAM Audio), a cutting-edge tool for separating sounds from audio and audiovisual sources. Users can isolate general sounds, music, and speech using various prompt types: text prompts that describe the desired audio, visual prompts that involve selecting parts of a video, and span prompts that allow users to specify a time range for the audio they want to extract. SAM Audioβs capabilities are designed to make audio separation intuitive and efficient.
The model excels in three main areas. It effectively distinguishes everyday sounds, such as traffic and dogs barking, which can be particularly useful for noise removal. When it comes to music, SAM Audio isolates instruments and vocals with high accuracy, competing with leading music separation technologies. For speech, the model extracts voices from background noise, enhancing clarity and enabling better communication in noisy environments.
SAM Audio achieves state-of-the-art performance through a generative separation model that utilizes a flow-matching Diffusion Transformer within a DAC-VAE latent space. This allows for high-quality audio generation. Additionally, the model is open source, accompanied by a unique evaluation dataset for prompted audio separation, enhancing its utility for developers and researchers. Notable collaborations from companies like 2gether-International and Starkey highlight real-world applications, particularly in empowering disabled entrepreneurs and improving hearing technologies.
Questions about this article
No questions yet.