FARMER is a novel generative framework that integrates Normalizing Flows and Autoregressive models for effective likelihood estimation and high-quality image synthesis directly from raw pixel data. It incorporates an invertible autoregressive flow to convert images into latent sequences and employs a self-supervised dimension reduction method to optimize the modeling process. Experimental results show that FARMER achieves competitive performance compared to existing models while ensuring exact likelihoods and scalable training.
PixelFlow introduces a novel approach to image generation by operating directly in raw pixel space, eliminating the need for pre-trained Variational Autoencoders. This method enhances the image generation process with efficient cascade flow modeling, achieving a competitive FID score of 1.98 on the ImageNet benchmark while offering high-quality and semantically controlled image outputs. The work aims to inspire future developments in visual generation models.