Quit Emailing Yourself

# multimodal → vision-language → model

2 links tagged with all of: multimodal + vision-language + model

Click any tag below to further narrow down your results

Links

GitHub - TencentCloudADP/youtu-vl: Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Youtu-VL is a 4B-parameter Vision-Language Model that excels in both vision-centric and general multimodal tasks without needing task-specific modules. It uses a unique autoregressive supervision method to enhance visual understanding and preserve detailed information. The model supports various applications, from image classification to visual question answering.

Saved by tldr-importer · Last saved February 14, 2026 · 3 min read

vision-language ✓ multimodal ✓ model ✓ + deep-learning + ai

GitHub - MoonshotAI/Kimi-VL: Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

Kimi-VL is an open-source Mixture-of-Experts vision-language model that excels in multimodal reasoning and long-context understanding with only 2.8B activated parameters. It demonstrates superior performance in various tasks such as multi-turn interactions, video comprehension, and mathematical reasoning, competing effectively with larger models while maintaining efficiency. The latest variant, Kimi-VL-A3B-Thinking-2506, enhances reasoning and visual perception capabilities, achieving state-of-the-art results in several benchmarks.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

vision-language ✓ multimodal ✓ + reasoning + open-source model ✓