Quit Emailing Yourself

7 links tagged with all of: deep-learning + image-generation

Click any tag below to further narrow down your results

Links

Junfeng5/Liquid_V1_7B · Hugging Face

Liquid is an innovative auto-regressive model that integrates visual comprehension and generation by tokenizing images into discrete codes and learning them alongside text tokens. This multimodal large language model operates within a shared feature space, allowing for seamless understanding and generation without relying on external visual embeddings. Liquid is available in multiple sizes and explores the scaling laws of multimodal models, revealing mutual benefits between understanding and generation tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ multimodal + language-model image-generation ✓ + tokenization deep-learning ✓

Next Visual Granularity Generation

A novel image generation approach called Next Visual Granularity (NVG) is introduced, which decomposes images into structured sequences to progressively refine them from a global layout to fine details. The NVG framework allows for high-fidelity and diverse image generation by utilizing a hierarchical representation that guides the process based on input text and current canvas. Extensive training on the ImageNet dataset demonstrates NVG's superior performance compared to previous models, with clear scaling behavior and improved FID scores.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

image-generation ✓ + visual-granularity + hierarchical-representation deep-learning ✓ + image-quality

GitHub - MCG-NJU/DDT: DDT: Decoupled Diffusion Transformer

The article presents the Decoupled Diffusion Transformer (DDT) architecture, demonstrating improved performance with a larger encoder in a diffusion model framework. It achieves state-of-the-art FID scores on ImageNet benchmarks and allows for accelerated inference by reusing encoders across steps. The implementation provides detailed configurations for training and inference, along with online demos.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ diffusion + transformer image-generation ✓ deep-learning ✓ + benchmarks

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

The paper presents BLIP3-o, a family of fully open unified multimodal models that enhance both image understanding and generation. It introduces a diffusion transformer for generating CLIP image features, advocates for a sequential pretraining strategy, and proposes a high-quality dataset, BLIP3o-60k, to improve performance across various benchmarks. The models, along with code and datasets, are open-sourced to foster further research.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ multimodal image-generation ✓ + computer-vision deep-learning ✓ + open-source

GitHub - Tencent-Hunyuan/HunyuanImage-3.0: HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

HunyuanImage-3.0 has been released as an open-source image generation model, featuring a unified multimodal architecture that integrates text and image understanding. It boasts the largest Mixture of Experts model with 80 billion parameters, enabling superior image generation capabilities while supporting extensive customization through various checkpoints and performance optimizations.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

image-generation ✓ + open-source + multimodal + artificial-intelligence deep-learning ✓

GitHub - ShoufaChen/PixelFlow: Pixel-Space Generative Models

PixelFlow introduces a novel family of image generation models that operate directly in pixel space, eliminating the need for pre-trained VAEs and allowing for end-to-end training. By utilizing efficient cascade flow modeling, it achieves impressive image quality with a low FID score of 1.98 on the ImageNet benchmark, showcasing its potential for both class-to-image and text-to-image tasks. The model aims to inspire future advancements in visual generation technologies.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ pixelflow image-generation ✓ + flow-models deep-learning ✓ + generative-models

zai-org/CogView4-6B · Hugging Face

CogView4-6B is a text-to-image generation model that supports a range of resolutions and offers optimized memory usage through CPU offloading. The model has demonstrated impressive performance benchmarks compared to other models like DALL-E 3 and SDXL, achieving high scores across various evaluation metrics. Users can install the necessary libraries and use a provided code snippet to generate images based on detailed prompts.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ cogview image-generation ✓ deep-learning ✓ + benchmarks + gpu-optimization