FARMER is a novel generative framework that integrates Normalizing Flows and Autoregressive models for effective likelihood estimation and high-quality image synthesis directly from raw pixel data. It incorporates an invertible autoregressive flow to convert images into latent sequences and employs a self-supervised dimension reduction method to optimize the modeling process. Experimental results show that FARMER achieves competitive performance compared to existing models while ensuring exact likelihoods and scalable training.
Recent advancements in generative diffusion models highlight their ability to understand image style and semantics. The paper introduces a novel attention distillation loss that enhances the transfer of visual characteristics from reference images to generated ones, optimizing the synthesis process and improving Classifier Guidance for faster and more versatile image generation. Extensive experiments validate the effectiveness of this approach in style and texture transfer.