Click any tag below to further narrow down your results
Links
Google is developing Nano Banana 2 Flash, a new AI image generator that offers faster performance than its predecessor, Nano Banana Pro. This model is based on the Gemini 3 Flash and is expected to launch by the end of the quarter amid growing competition in the AI image generation market.
FLUX.2 is a new image generation and editing model that excels in creating high-quality images while maintaining consistency across multiple references. It supports detailed typography and complex prompts, making it suitable for various creative workflows. The model emphasizes open innovation, offering different versions for developers and teams.
Google is set to release the Gemini 3 Pro preview in November, with a wider rollout expected in December. This new model will likely support a 1 million token context window, enhancing its capabilities for handling large documents and data. Speculation also surrounds its potential integration with the upcoming Nano Banana 2 image generation model.
Somake is an AI-driven platform that allows users to generate images and videos without needing design skills. You can create visuals by simply describing your idea or uploading an image, and the tool offers features like background removal and image enhancement. It’s designed for creators looking to streamline their content production.
LateNiteSoft tested over 600 image generations across various AI models to determine which performs best for common photo edits. The article highlights the strengths and weaknesses of models like OpenAI, Gemini, and Seedream in different editing scenarios, from classic filters to style transfers.
ImagineX is an AI-powered platform that allows creators to generate videos and images quickly and professionally. It offers tools for text-to-image generation, short-form video creation, and more, making it accessible for users of all skill levels. Flexible pricing plans cater to individuals and teams.
STARFlow and STARFlow-V are open-source models designed for generating high-quality images and videos from text prompts. They combine autoregressive models with normalizing flows to achieve impressive results in both text-to-image and text-to-video tasks. Users can easily set up the models and start generating content with provided scripts and configurations.
Google is trialing a new image AI called "Nano Banana 2 Flash," which aims to be quicker and more affordable than its predecessor, the Nano Banana Pro. While it will be less powerful, it's designed for efficient image generation and editing. The model was identified by a reliable source known for leaking details on Gemini's technology.
GLM-Image is an open-source model that combines auto-regressive and diffusion techniques for high-quality image generation. It excels in generating detailed images from text prompts and supports various image editing tasks. The model uses a semantic-VQ tokenization strategy to enhance semantic understanding and visual fidelity.
Google is set to release Nano Banana version 2, called GEMPIX2, within the week, as indicated by new announcement cards in the Gemini interface. This model aims to serve creators and professionals, building on the success of its predecessor. Details on specific improvements are still pending.
Google’s new AI model, Nano Banana Pro, excels at generating images based on detailed prompts and integrating web data for infographics. While it shows significant improvements over previous models, it still has limitations, particularly in complex technical tasks like circuit design. The model is a step forward in creating factually accurate visuals for various applications.
Marble is a generative world model that creates 3D environments from text, images, and videos. It allows users to edit and expand these worlds interactively, offering fine control over the creative process. New features include tools for world editing and composition, enabling users to build larger, more detailed spaces.
The article introduces the FLUX.2 [klein] model family, which offers rapid image generation and editing capabilities in under half a second. It combines text-to-image and multi-reference generation in a compact architecture that runs efficiently on consumer hardware. Open weights are available for customization and fine-tuning.
Monet AI combines top video, image, and audio generation models into a single platform. Users can easily switch between different AI tools for various creative projects without the hassle of multiple registrations or interfaces. The platform offers a streamlined workflow and cost-effective solutions for creators.
OpenAI is set to launch new image generation models, Image-2 and Image-2-mini, designed to enhance visual quality and detail compared to the previous Image-1. Early tests show significant improvements in image fidelity and color accuracy, narrowing the gap with competitors like Google's Nano Banana 2. The rollout is likely to coincide with the anticipated GPT-5.2 release.
A study shows that AI image generators often default to 12 specific photo styles, regardless of the initial prompts. When tested through a visual telephone method, the images quickly lost detail but consistently converged on these familiar motifs, described as "visual elevator music."
Kolors AI offers a range of tools for generating images and designs tailored to specific needs, such as product photography, posters, and infographics. Users can easily input ideas and receive professional-grade visuals for various applications, including commercial use. The platform also supports virtual try-ons and AI photo editing.
This article discusses the capabilities of Google's new image generation model, Nano Banana, which boasts strong adherence to prompts and impressive editing features. The author compares it to previous models, evaluates its performance with complex prompts, and highlights its unique attributes.
This article details a platform that streamlines the creation of images and videos using AI. Users can generate themes, edit photos, and create animations quickly without needing extensive design skills. It offers features like photo restoration, clean cutouts, and customizable prompts for generating visuals.
Black Forest Labs, a German AI startup known for image generation, secured $300 million in a Series B funding round, valuing the company at $3.25 billion. The funds will support their research and development efforts, particularly for their new image-generation model, Flux 2. The company gained attention for its technology used in various applications, including by Elon Musk’s Grok chatbot.
OpenAI has updated ChatGPT to improve its image generation capabilities, allowing users to create and edit images more accurately and up to four times faster. This update aims to strengthen ChatGPT's position in a competitive market, particularly against Google's new AI model, Gemini 3.
Domer AI allows users to create videos and images quickly by typing descriptions or uploading photos. It offers various models for different quality and speed needs, making it suitable for social media content without the hassle of filming or editing. Users can start with 10 free credits and have full rights to their creations.
Google is set to launch Nano Banana 2, which improves on its predecessor with better image processing capabilities, including precise coloring and error correction. The new model features an iterative workflow for enhanced accuracy and supports various aspect ratios and resolutions. Internal testing hints at a possible rebranding to “Nano Banana Pro.”
Google is adding an "import AI chats" feature to Gemini, allowing users to upload conversations from other platforms. This aims to retain users' chat history and improve context. Additionally, Gemini will offer higher-resolution image downloads and a new feature related to video content verification.
YouArt offers a platform for automating creative processes, allowing users to generate and edit images and videos using various AI models. It features a creative agent that can manage workflows and provides free templates to help users get started. Membership includes access to multiple advanced AI tools and a credit system for generating content.
This article provides a comprehensive guide on using the Nano Banana Pro model in Google AI Studio. It covers setup, advanced features like "thinking" capabilities, 4K generation, and best practices for effective prompting.
Meta is creating a new AI model called Mango for image and video generation, along with a text-based model named Avocado aimed at improving coding capabilities. These models are expected to launch in early 2026, following a major restructuring of Meta's AI team that included hiring over 20 researchers from OpenAI.
This article discusses recent developments in AI, including the release of OpenAI's GPT 5.2 and Image 1.5, along with a new deal with Disney and a lawsuit against Google. It also touches on regulatory efforts by the Trump Administration and various AI models' performances.
The article explores the parallels between the film "Lord of War" and the current AI compute market, focusing on Jensen Huang of Nvidia as the central figure in the AI arms race. It details an experiment where multiple AI image generation models were tested to recreate a parody poster, "Lord of Tokens," using advanced prompts that challenge the models' capabilities. The results highlight varying levels of success in achieving the desired artistic and technical details.
Liquid is an innovative auto-regressive model that integrates visual comprehension and generation by tokenizing images into discrete codes and learning them alongside text tokens. This multimodal large language model operates within a shared feature space, allowing for seamless understanding and generation without relying on external visual embeddings. Liquid is available in multiple sizes and explores the scaling laws of multimodal models, revealing mutual benefits between understanding and generation tasks.
OpenAI is testing a watermark feature for its Image Generation model within the ChatGPT 4o framework, primarily due to the popularity of users generating Studio Ghibli-style art. The watermark will be applied to images created by free users, while ChatGPT Plus subscribers will have the option to save images without the watermark. There are ongoing developments for an ImageGen API, allowing developers to create their own applications.
ConceptAttention is an interpretability method designed for multi-modal diffusion transformers, specifically implemented for the Flux DiT architecture using PyTorch. The article provides installation instructions and a code example for generating images and concept attention heatmaps. It also references the associated research paper for further details.
A new library for image generation using ChatGPT has been made available, allowing users to create images based on text prompts. This development enhances the capabilities of AI in the creative field, enabling seamless integration of text-to-image generation for various applications.
Qwen Chat provides a wide range of functionalities, including chatbot capabilities, image and video understanding, and image generation. It also supports document processing, web search integration, and tool utilization, making it a versatile solution for various tasks.
Qwen-Image, a 20B MMDiT image foundation model, offers advanced capabilities in complex text rendering and image editing, outperforming existing models in various benchmarks. Its strengths include high-fidelity text generation in both English and Chinese, consistent image editing, and versatility in artistic styles, making it a powerful tool for content creators. The model aims to lower barriers in visual content creation and foster community engagement in generative AI development.
A novel image generation approach called Next Visual Granularity (NVG) is introduced, which decomposes images into structured sequences to progressively refine them from a global layout to fine details. The NVG framework allows for high-fidelity and diverse image generation by utilizing a hierarchical representation that guides the process based on input text and current canvas. Extensive training on the ImageNet dataset demonstrates NVG's superior performance compared to previous models, with clear scaling behavior and improved FID scores.
image-generation ✓
+ visual-granularity
+ hierarchical-representation
+ deep-learning
+ image-quality
Imagen 4, Google's latest text-to-image model, is now available for paid preview in the Gemini API and for limited free testing in Google AI Studio. It includes two variants, Imagen 4 for general tasks and Imagen 4 Ultra for precision, both featuring improved text rendering and image generation quality. All generated images will include a non-visible digital watermark for trust and transparency.
The article discusses various updates and features introduced at DevDay 2025, including the Sora 2 SDK, advancements in GPT-5 with Pro AgentKit, and new capabilities in image generation and speech-to-speech mini models. These innovations are aimed at enhancing user experiences and expanding the functionality of applications powered by OpenAI technologies.
FLUX.1 Kontext [pro] is an advanced image generation and editing model that emphasizes prompt adherence. The article provides several examples of API usage for tasks such as image generation, chat completions, and audio processing using this model, although it is currently unsupported on Together AI.
Google has launched the Gemini 2.5 Flash Image model, now available to developers and enterprises through the Gemini API, Google AI Studio, and Vertex AI. This production-ready tool offers advanced features for image generation and editing, supporting multiple aspect ratios and enabling real-time applications at competitive pricing. Developers are already incorporating it into various creative and educational workflows.
The article presents the Decoupled Diffusion Transformer (DDT) architecture, demonstrating improved performance with a larger encoder in a diffusion model framework. It achieves state-of-the-art FID scores on ImageNet benchmarks and allows for accelerated inference by reusing encoders across steps. The implementation provides detailed configurations for training and inference, along with online demos.
REPA-E introduces a family of end-to-end tuned Variational Autoencoders (VAEs) that significantly improve text-to-image (T2I) generation quality and training efficiency. The method enables effective joint training of VAEs and diffusion models, achieving state-of-the-art performance on ImageNet and enhancing latent space structure across various VAE architectures. Results show accelerated generation performance and better image quality, making E2E-VAEs superior replacements for traditional VAEs.
OmniCaptioner is a versatile visual captioning framework designed to generate detailed textual descriptions across various visual domains, including natural images, visual text, and structured visuals. It enhances visual reasoning with large language models (LLMs), improves image generation tasks, and allows for efficient supervised fine-tuning by converting pixel data into rich semantic representations. The framework aims to bridge the gap between visual and textual modalities through a unified multimodal pretraining approach.
OpenAI reported that ChatGPT users have generated over 700 million images within a week, highlighting the rapid growth and popularity of its image generation capabilities. The surge in usage reflects a significant increase in user engagement and interest in AI-generated content.
ByteDance has launched its AI image generation tool, Seedream 4.0, claiming it surpasses Google DeepMind's Nano Banana in key performance metrics like prompt adherence and aesthetics. While Seedream 4.0 combines the capabilities of its predecessors and offers faster image processing, it has yet to be evaluated by major benchmark firms. The tool is currently available to domestic users and corporate clients at competitive pricing.
OpenAI is expanding its image-generating feature, gpt-image-1, to other developers and applications, including Adobe's Firefly and tools like Figma and Wix. This follows a surge in usage where over 130 million users created 700 million images in just the first week. Additionally, Microsoft will integrate OpenAI's image generation into its Microsoft 365 Copilot app, enhancing competition with Google in the generative AI market.
NVIDIA has introduced a new AI blueprint that facilitates the integration between Blender and AI image generation tools, enhancing the workflow for 3D artists. This development aims to streamline the creative process, allowing users to leverage AI capabilities directly within their 3D modeling environment.
Llama 4 Scout is a state-of-the-art 109 billion parameter model designed for tasks such as multi-document analysis, codebase reasoning, and personalized tasks. While it currently lacks support on Together AI, the platform offers a variety of APIs for different functionalities including chat completions, image generation, audio transcription, and video creation. Users can register for an account to access the API and utilize free credits to start their projects.
Large diffusion models like Flux can generate impressive images but require substantial memory, making quantization an attractive option to reduce their size without significantly affecting output quality. The article discusses various quantization backends available in Hugging Face Diffusers, including bitsandbytes, torchao, and Quanto, and provides examples of how to implement these quantizations to optimize memory usage and performance in image generation tasks.
The article discusses the integration of ChatGPT with image generation capabilities, exploring how this combination can enhance user creativity and productivity. It highlights various applications and potential use cases, emphasizing the transformative impact of AI in visual content creation.
Representation Autoencoders (RAEs) enhance diffusion transformers by leveraging pretrained encoders and lightweight decoders to achieve superior image generation results, outperforming traditional methods like SD-VAE. The study reveals that RAE's reconstruction quality is high, and for optimal performance, the model width must match or exceed the encoder's token dimension. Additionally, the proposed DiTDH model demonstrates significant efficiency and effectiveness, setting new state-of-the-art scores in image generation tasks.
+ representation-autoencoders
+ diffusion-transformers
image-generation ✓
+ neural-networks
+ machine-learning
PixelFlow introduces a novel approach to image generation by operating directly in raw pixel space, eliminating the need for pre-trained Variational Autoencoders. This method enhances the image generation process with efficient cascade flow modeling, achieving a competitive FID score of 1.98 on the ImageNet benchmark while offering high-quality and semantically controlled image outputs. The work aims to inspire future developments in visual generation models.
The paper presents BLIP3-o, a family of fully open unified multimodal models that enhance both image understanding and generation. It introduces a diffusion transformer for generating CLIP image features, advocates for a sequential pretraining strategy, and proposes a high-quality dataset, BLIP3o-60k, to improve performance across various benchmarks. The models, along with code and datasets, are open-sourced to foster further research.
OpenAI has introduced the `gpt-image-1` model for image generation via its API, allowing developers to integrate high-quality image creation into their products. The model supports diverse styles and applications, with notable collaborations from companies like Adobe, Canva, and HubSpot to enhance creative and marketing processes.
OpenAI has introduced new tools and features to its Responses API, enhancing capabilities for developers building agentic applications. Key updates include support for remote MCP servers, enhanced image generation, Code Interpreter integration, and improved reliability and privacy features for enterprises.
HunyuanImage-3.0 has been released as an open-source image generation model, featuring a unified multimodal architecture that integrates text and image understanding. It boasts the largest Mixture of Experts model with 80 billion parameters, enabling superior image generation capabilities while supporting extensive customization through various checkpoints and performance optimizations.
Llama 4 Maverick is a state-of-the-art multilingual model designed for image and text understanding, creative writing, and enterprise applications. While it is not yet supported on Together AI, users can register for an account to access various API functionalities, including image generation, chat completions, and audio transcriptions. The model allows for versatile applications such as generating videos and embeddings based on user prompts.
Google has launched Gemini 2.5 Flash Image, an advanced image generation and editing model that allows users to blend multiple images, maintain character consistency, and execute targeted transformations using natural language. The model is available through the Gemini API and Google AI Studio for developers, priced at $30 per million output tokens, and includes features for creating custom apps and educational tools. All generated images will carry an invisible digital watermark for identification as AI-generated content.
PixelFlow introduces a novel family of image generation models that operate directly in pixel space, eliminating the need for pre-trained VAEs and allowing for end-to-end training. By utilizing efficient cascade flow modeling, it achieves impressive image quality with a low FID score of 1.98 on the ImageNet benchmark, showcasing its potential for both class-to-image and text-to-image tasks. The model aims to inspire future advancements in visual generation technologies.
OpenAI is experimenting with visible and invisible watermarks for images generated by its ChatGPT-4o model to enhance content traceability and compliance. The visible watermark, labeled “ImageGen,” is being tested for free-tier users while paid users will receive images without watermarks. This move aligns with broader industry efforts to improve attribution for AI-generated content.
AI image generation can be both rewarding and challenging, with varying results based on the prompts used. Experts share tips and techniques for leveraging different AI tools effectively, including using style codes, brainstorming image ideas, and refining prompts. The article highlights the importance of experimentation and creativity in producing quality images with AI.
A stealth AI model has outperformed well-known competitors like DALL-E and Midjourney on a popular benchmark, demonstrating its advanced capabilities in image generation. The creators of this model have successfully secured $30 million in funding to further develop their technology.
GigaTok is a novel method designed for scaling visual tokenizers to 3 billion parameters, addressing the reconstruction vs. generation dilemma through semantic regularization. It offers a comprehensive framework for training and evaluating tokenizers, alongside various model configurations and instructions for setup and usage. The project is a collaboration involving extensive research and experimentation, with resources available for further exploration.
A new model labeled "GPT-5 Mini Scout" briefly appeared in ChatGPT's model selector, sparking speculation about its connection to a new Company Knowledge feature for enterprise users. An update in OpenAI’s JavaScript library hinted at the model being named "GPT-5.1 Mini," suggesting significant advancements in image generation capabilities. The potential rollout for this model is anticipated in November, possibly in response to competitors like Google's Gemini 3.
ByteDance has introduced a new AI image model aimed at competing with Google DeepMind's Nano Banana, showcasing advancements in image generation technology. This development highlights the growing rivalry in the AI landscape, particularly among major tech companies.
Google Cloud has expanded Vertex AI with three new generative AI media models: Imagen 4 for high-quality image generation, Veo 3 for advanced video creation with audio, and Lyria 2 for music generation. These tools aim to enhance content creation efficiency and creativity across various industries, enabling users to produce stunning visual and audio assets more rapidly.
Generating detailed images with AI has become more accessible by connecting Claude to Hugging Face Spaces, enabling users to leverage advanced models like FLUX.1 Krea and Qwen-Image. These models enhance image realism and text quality, allowing for creative projects such as posters and marketing materials. Users can easily configure and switch between these models to achieve desired results.
Midjourney has unveiled its latest AI image model, marking its first significant release in nearly a year. The new model focuses on enhanced image generation capabilities, providing users with improved tools for creative expression. This update reflects Midjourney's commitment to advancing AI technology in the visual arts.
HiDream-I1 is an open-source image generative foundation model boasting 17 billion parameters, delivering high-quality image generation in seconds. Its recent updates include the release of various models and integrations with popular platforms, enhancing its usability for developers and users alike. For full capabilities, users can explore additional resources and demos linked in the article.
VARGPT-v1.1 is a powerful multimodal model that enhances visual understanding and generation capabilities through iterative instruction tuning and reinforcement learning. It includes extensive code releases for training, inference, and evaluation, as well as a comprehensive structure for multimodal tasks such as image captioning and visual question answering. The model's checkpoints and datasets are available on Hugging Face, facilitating further research and application development.
UCGM is an official PyTorch implementation that provides a unified framework for training and sampling continuous generative models, such as diffusion and flow-matching models. It enables significant acceleration of sampling processes and efficient tuning of pre-trained models, achieving impressive FID scores across various datasets and resolutions. The framework supports diverse architectures and offers tools for both training and evaluating generative models.
Google has launched a preview of its Gemini 2.0 Flash image generation capabilities, enabling developers to integrate enhanced conversational image generation and editing with improved visual quality and reduced filter block rates. The Gemini API is available through Google AI Studio and Vertex AI, encouraging developers to explore its functionalities, including recontextualizing products in new environments.
xAI is set to enhance its Grok app with the introduction of a new character, Valentin, and a feature called Imagine that enables infinite image and video generation with sound. These updates aim to attract creative users, particularly women, by offering customizable experiences and a focus on user-generated content. The launch is anticipated to coincide with the release of GPT-5, positioning Grok as a competitive player in the generative AI landscape.
CogView4-6B is a text-to-image generation model that supports a range of resolutions and offers optimized memory usage through CPU offloading. The model has demonstrated impressive performance benchmarks compared to other models like DALL-E 3 and SDXL, achieving high scores across various evaluation metrics. Users can install the necessary libraries and use a provided code snippet to generate images based on detailed prompts.