Quit Emailing Yourself

# multimodal → open-source → computer-vision

2 links tagged with all of: multimodal + open-source + computer-vision

Click any tag below to further narrow down your results

Links

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

The paper presents BLIP3-o, a family of fully open unified multimodal models that enhance both image understanding and generation. It introduces a diffusion transformer for generating CLIP image features, advocates for a sequential pretraining strategy, and proposes a high-quality dataset, BLIP3o-60k, to improve performance across various benchmarks. The models, along with code and datasets, are open-sourced to foster further research.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

multimodal ✓ + image-generation computer-vision ✓ + deep-learning open-source ✓

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

InternVL3.5 introduces a new family of open-source multimodal models that enhance versatility, reasoning capabilities, and inference efficiency. A key innovation is the Cascade Reinforcement Learning framework, which improves reasoning tasks significantly while a Visual Resolution Router optimizes visual token resolution. The model achieves notable performance gains and supports advanced capabilities like GUI interaction and embodied agency, positioning it competitively against leading commercial models.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

multimodal ✓ + reasoning + reinforcement-learning open-source ✓ computer-vision ✓