4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Voxtral has released two new speech-to-text models, Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live applications. Both models support 13 languages, offer high accuracy, and are designed for efficiency in various use cases like meeting transcription and voice applications.
If you do, here's more
Mistral has launched Voxtral Transcribe 2, featuring two advanced speech-to-text models: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Both models promise high-quality transcription, with Mini Transcribe V2 achieving a low 4% word error rate across 13 languages at a competitive cost of $0.003 per minute. Voxtral Realtime is optimized for real-time use, boasting configurable latency down to sub-200ms, essential for applications requiring immediate feedback, like voice agents.
Voxtral Realtime employs a novel streaming architecture, allowing it to transcribe audio as it streams, rather than in chunks. This provides near-offline accuracy even at low latency. It supports 13 languages and is designed to run efficiently on edge devices, ensuring privacy. The modelβs weights are available under the Apache 2.0 license, promoting ease of deployment.
Mini Transcribe V2 enhances transcription quality with features like speaker diarization, context biasing for specific vocabulary, and word-level timestamps, crucial for applications like meeting transcriptions or audio search. It can process recordings up to three hours long and demonstrates robustness in noisy environments. Mistral also introduced an audio playground within Mistral Studio, allowing users to test the transcription models directly with various audio files and settings. This move enhances accessibility for developers looking to integrate powerful transcription capabilities into their applications.
Questions about this article
No questions yet.