1 link tagged with all of: machine-learning + multimodal + speech-understanding
Click any tag below to further narrow down your results
Links
Voxtral Mini and Voxtral Small are two multimodal audio chat models designed to understand both spoken audio and text. They achieve state-of-the-art performance on various audio benchmarks while maintaining strong text capabilities, with Voxtral Small being efficient enough for local deployment. The models include a 32K context window for processing lengthy audio and multi-turn conversations and come with three new benchmarks for evaluating speech understanding in knowledge and trivia.