Quit Emailing Yourself

# multimodal → dataset

3 links tagged with all of: multimodal + dataset

Click any tag below to further narrow down your results

Links

OmniSVG: A Unified Scalable Vector Graphics Generation Model

OmniSVG is a unified framework for generating high-quality scalable vector graphics (SVG) using pre-trained Vision-Language Models (VLMs), which decouples structural logic from low-level geometry. It introduces the MMSVG-2M dataset with two million annotated SVG assets and supports multiple generation modalities, demonstrating superior performance over existing methods for diverse creative tasks. The model is designed to handle complexity ranging from simple icons to intricate illustrations, offering flexibility for professional design workflows.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ svg + generation multimodal ✓ dataset ✓ + vision-language

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Mini-o3 introduces an advanced system that enhances tool-based interactions for visual reasoning by supporting deep, multi-turn reasoning and achieving state-of-the-art performance on visual search tasks. The system utilizes a novel over-turn masking strategy to effectively manage response lengths during reinforcement learning, combined with a comprehensive dataset designed for exploratory reasoning. Open-source code and models are provided to facilitate reproducibility and further research.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ visual-search multimodal ✓ + reinforcement-learning + open-source dataset ✓

[2510.19808] Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

The article introduces the Pico-Banana-400K dataset, a large-scale collection of 400,000 images designed for text-guided image editing. It aims to address the limitations in existing datasets by providing high-quality, diverse edit pairs generated from real photographs, facilitating advanced research in multimodal image editing techniques. The dataset includes specialized subsets for multi-turn editing, preference research, and instruction summarization.

Saved by hn_user_1 · 2 others saved this · Last saved October 28, 2025 · 3 min read

dataset ✓ + image editing multimodal ✓