3 links tagged with all of: multimodal + visual-reasoning
Click any tag below to further narrow down your results
Links
The article discusses the launch of GLM-4.6V and GLM-4.5V, two advanced vision-language models. GLM-4.6V features a 128K context and supports multimodal inputs, while GLM-4.5V excels in visual reasoning across various benchmarks. Both models offer distinct capabilities for image and video analysis.
Baidu released the ERNIE-4.5-VL-28B-A3B-Thinking, an AI model that claims to outperform Google and OpenAI’s offerings in visual reasoning while using fewer computing resources. The model features a unique dynamic image analysis capability that mimics human problem-solving. It’s designed for enterprise applications, including document processing and manufacturing quality control.
OpenAI's latest models, o3 and o4-mini, enhance visual reasoning capabilities by enabling the integration of image processing within their chain-of-thought, allowing for more thorough analyses and problem-solving. These advancements significantly outperform previous models across various multimodal benchmarks, marking a crucial step in multimodal reasoning.