Quit Emailing Yourself

# multimodal → inference

3 links tagged with all of: multimodal + inference

Click any tag below to further narrow down your results

Links

Ollama's new engine for multimodal models · Ollama Blog

Ollama has introduced a new engine that supports multimodal models, emphasizing improved accuracy, model modularity, and memory management. The update allows for better integration of vision and text models, enhancing the capabilities of local inference for various applications, including image recognition and reasoning. Future developments will focus on supporting longer context sizes and enabling advanced functionalities.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

multimodal ✓ + models inference ✓ + accuracy + integration

GitHub - yannqi/R-4B: The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"

R-4B is a multimodal large language model that enhances general-purpose auto-thinking by dynamically switching between thinking and non-thinking modes based on task complexity. It employs a two-stage training approach to improve response efficiency and reduce computational costs, achieving state-of-the-art performance among similar models. The model is open-source and offers user control over its thinking capabilities.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

multimodal ✓ + language-model + auto-thinking + open-source inference ✓

GitHub - visresearch/LLaVA-STF: The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

The repository provides an implementation of the method "Learning Compact Vision Tokens for Efficient Large Multimodal Models," which enhances inference efficiency by fusing spatial-adjacent vision tokens and introducing a Multi-Block Token Fusion module. Experimental results show that this approach achieves competitive performance on various vision-language benchmarks while using only 25% of the baseline vision tokens.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

multimodal ✓ + vision-tokens inference ✓ + efficiency + deep-learning