Quit Emailing Yourself

# deep-learning → model

2 links tagged with all of: deep-learning + model

Click any tag below to further narrow down your results

Links

GitHub - TencentCloudADP/youtu-vl: Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Youtu-VL is a 4B-parameter Vision-Language Model that excels in both vision-centric and general multimodal tasks without needing task-specific modules. It uses a unique autoregressive supervision method to enhance visual understanding and preserve detailed information. The model supports various applications, from image classification to visual question answering.

Saved by tldr-importer · Last saved February 14, 2026 · 3 min read

+ vision-language + multimodal model ✓ deep-learning ✓ + ai

allenai/olmOCR-2-7B-1025 · Hugging Face

The olmOCR-2-7B-1025 model is a fine-tuned version of Qwen2.5-VL-7B-Instruct, designed to enhance optical character recognition (OCR) capabilities, especially for complex cases like math equations and tables. It is recommended to use the FP8 version for practical applications and can handle large-scale document processing through the olmOCR toolkit. The model demonstrates high performance on various OCR benchmarks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ ocr model ✓ + allenai deep-learning ✓ + toolkit