12 links
tagged with generative-models
Click any tag below to further narrow down your results
Links
Pingkit is a toolkit designed for training reproducible, capacity-aware models using transformer activations. It offers features for extracting embeddings, training neural architectures, and creating custom probes tailored to specific research needs. The toolkit is integrated with Hugging Face models and provides various utilities for data processing and model training.
The essay critiques various perspectives on world models, which are essential for developing virtual agents with artificial general intelligence. Drawing from sci-fi and psychology, it emphasizes that a world model should simulate all actionable possibilities of the real world for effective reasoning and action, and proposes a new hierarchical architecture for such models within a Physical, Agentic, and Nested (PAN) AGI framework.
FARMER is a novel generative framework that integrates Normalizing Flows and Autoregressive models for effective likelihood estimation and high-quality image synthesis directly from raw pixel data. It incorporates an invertible autoregressive flow to convert images into latent sequences and employs a self-supervised dimension reduction method to optimize the modeling process. Experimental results show that FARMER achieves competitive performance compared to existing models while ensuring exact likelihoods and scalable training.
In-context Ranking (ICR) utilizes the contextual understanding of large language models (LLMs) for information retrieval by incorporating the task description, candidate documents, and query into the model's input. This paper introduces BlockRank, a new method that enhances the efficiency of attention operations in LLMs by enforcing inter-document block sparsity and optimizing query-document relevance, achieving significant performance improvements and scalability for long context retrieval tasks. Experiments demonstrate that BlockRank matches or surpasses state-of-the-art methods while being considerably more efficient at inference.
The paper explores the enhancement of reward modeling in reinforcement learning for large language models, focusing on inference-time scalability. It introduces Self-Principled Critique Tuning (SPCT) to improve generative reward modeling and proposes a meta reward model to optimize performance during inference. Empirical results demonstrate that SPCT significantly enhances the quality and scalability of reward models compared to existing methods.
PixelFlow introduces a novel approach to image generation by operating directly in raw pixel space, eliminating the need for pre-trained Variational Autoencoders. This method enhances the image generation process with efficient cascade flow modeling, achieving a competitive FID score of 1.98 on the ImageNet benchmark while offering high-quality and semantically controlled image outputs. The work aims to inspire future developments in visual generation models.
Contemporary generative models leverage a two-stage approach, first extracting a latent representation of input signals via an autoencoder, then training a generative model on these latents. This method enhances efficiency by focusing on perceptually meaningful information while reducing the computational burden associated with processing raw pixel or waveform data. The article details the training process, the evolution of generative techniques, and the significance of latent representations in modern applications.
PixelFlow introduces a novel family of image generation models that operate directly in pixel space, eliminating the need for pre-trained VAEs and allowing for end-to-end training. By utilizing efficient cascade flow modeling, it achieves impressive image quality with a low FID score of 1.98 on the ImageNet benchmark, showcasing its potential for both class-to-image and text-to-image tasks. The model aims to inspire future advancements in visual generation technologies.
ContinualFlow is a novel framework designed for targeted unlearning in generative models, utilizing Flow Matching and an energy-based reweighting loss to effectively remove undesired data distribution regions without extensive retraining. The method demonstrates its effectiveness through various experiments and provides visualizations and quantitative evaluations to support its claims.
UCGM is an official PyTorch implementation that provides a unified framework for training and sampling continuous generative models, such as diffusion and flow-matching models. It enables significant acceleration of sampling processes and efficient tuning of pre-trained models, achieving impressive FID scores across various datasets and resolutions. The framework supports diverse architectures and offers tools for both training and evaluating generative models.
Decrypted generative model safety files for Apple Intelligence provide filters that determine how models should behave regarding harmful content. The repository includes scripts for retrieving encryption keys and decrypting overrides, as well as tools for combining and deduplicating metadata for easier review. The metadata helps analyze safety filters across different contexts, aiding in understanding global and region-specific content regulations.
Recent advancements in generative diffusion models highlight their ability to understand image style and semantics. The paper introduces a novel attention distillation loss that enhances the transfer of visual characteristics from reference images to generated ones, optimizing the synthesis process and improving Classifier Guidance for faster and more versatile image generation. Extensive experiments validate the effectiveness of this approach in style and texture transfer.