1 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
ElevenLabs introduced Scribe v2 Realtime, a Speech to Text model that transcribes live speech with a latency under 150 ms. It supports multiple languages and features like automatic language detection and voice activity detection, making it suitable for voice agents and real-time captioning. The model achieves 93.5% accuracy across various languages and is available through their API.
If you do, here's more
Scribe v2 Realtime, launched at the London Summit on February 11th, is a new low-latency Speech to Text model capable of delivering live transcription in under 150 milliseconds. This model excels in environments that require real-time communication, such as voice agents, meeting assistants, and captioning services. It supports transcription in English, French, German, Italian, Spanish, Portuguese, and 90 additional languages, allowing for seamless multilingual conversations.
The technology behind Scribe v2 Realtime includes features that enhance its functionality. It boasts automatic language detection, enabling users to switch languages mid-conversation without interruption. The model also offers negative latency, which predicts the next word and punctuation, and manual commit functionality, giving users control over finalizing transcript segments. Accuracy rates reach 93.5% across 30 common European and Asian languages, ensuring reliable transcription even in challenging audio conditions, as demonstrated by its performance on 500 samples with background noise.
Scribe v2 Realtime is built with enterprise compliance in mind, adhering to standards such as SOC 2, ISO 27001, and HIPAA. It offers various options for data residency, including EU and India, and features a Zero retention mode for sensitive information. Available through the ElevenLabs API, this tool allows developers to create voice assistants for a range of applications, from customer support to sales.
Questions about this article
No questions yet.