1 link tagged with all of: multimodal + machine-learning + multilingual
Click any tag below to further narrow down your results
Links
Qwen3-Omni is a cutting-edge multilingual omni-modal foundation model capable of processing text, images, audio, and video, providing real-time streaming responses. It features significant architectural advancements for performance, supports 119 text languages, and offers various applications through detailed cookbooks, including speech recognition, audio captioning, and video analysis. The model is available for use via Hugging Face and ModelScope, with recommendations for optimal performance.