5 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Omnilingual ASR is a speech recognition system that supports over 1,600 languages, including many that lack previous ASR technology. It allows users to add new languages with minimal examples and no special skills. The system is designed for accessibility and includes various model options for different use cases.
If you do, here's more
Omnilingual ASR is an open-source speech recognition system that supports over 1,600 languages, many of which have never been addressed by existing technology. This system allows users to add new languages quickly, using only a few paired examples, without needing extensive datasets or specialized skills. Its architecture combines zero-shot learning with a flexible model family, aimed at making speech technology accessible to a wider range of communities and researchers.
The 7B-LLM-ASR model demonstrates state-of-the-art performance, achieving character error rates below 10% for 78% of the supported languages. Users can access a comprehensive suite of models including checkpoints for improved accuracy and a variant capable of handling unlimited audio lengths. However, the latter's fine-tuning recipes are not yet available. Installation is straightforward, requiring basic commands for setup, and the system has detailed guides for transcription, data preparation, and model training.
Developed using the fairseq2 toolkit, Omnilingual ASR supports batch processing and various audio formats, but currently limits audio files to under 40 seconds for certain models. The system also provides a large-scale multilingual speech dataset on HuggingFace, which can be used directly with the inference pipeline for testing and evaluation purposes. Users can easily check language support through a provided code snippet, which lists languages in a specific format, such as "eng_Latn" for English in Latin script.
Questions about this article
No questions yet.