Quit Emailing Yourself

apple/starflow · Hugging Face

4 min read | Saved February 14, 2026 | Copied!

image-generation 🤖 video-generation 🤖 machine-learning 🤖 transformers 🤖 open-source 🤖

Do you care about this?

STARFlow and STARFlow-V are open-source models designed for generating high-quality images and videos from text prompts. They combine autoregressive models with normalizing flows to achieve impressive results in both text-to-image and text-to-video tasks. Users can easily set up the models and start generating content with provided scripts and configurations.

If you do, here's more

STARFlow and STARFlow-V are advanced transformer autoregressive flow models designed for generating high-quality images and videos. Their architecture cleverly merges the strengths of autoregressive models with the efficiency of normalizing flows, achieving impressive results in both text-to-image and text-to-video tasks. STARFlow has been highlighted in a NeurIPS 2025 Spotlight for its ability to scale latent normalizing flows for high-resolution image synthesis, while STARFlow-V focuses on end-to-end video generative modeling.

The models require specific setup for use. Users need to clone the GitHub repository and set up a conda environment or install dependencies manually. Pretrained model checkpoints must be downloaded and placed in designated directories. For generating images, users can run scripts with basic prompts or customize settings like aspect ratio and batch size. Video generation works similarly but allows for longer sequences and the inclusion of input images for more control. The scripts provide commands to streamline both image and video generation processes.

Training your own models is also an option. For image generation, a quick training test can be initiated with a simple command, or users can run a full training setup with custom parameters. The same applies to video training, where users can resume from checkpoints if necessary. Key features of the models include efficient training capabilities, support for various resolutions, and advanced text conditioning. The architecture supports large-scale distributed training and fast sampling, making it suitable for serious applications in computer vision and generative modeling.

The article includes detailed configuration options for both image and video generation, including parameters for sampling and model architecture. It provides a structured project layout with scripts for training, sampling, and utilities for processing. There are also tips for optimizing image and video generation quality, such as adjusting guidance scales and experimenting with aspect ratios. These insights can help users better navigate the capabilities of STARFlow and STARFlow-V as they explore image and video synthesis tasks.

Questions about this article

No questions yet.