A novel image generation approach called Next Visual Granularity (NVG) is introduced, which decomposes images into structured sequences to progressively refine them from a global layout to fine details. The NVG framework allows for high-fidelity and diverse image generation by utilizing a hierarchical representation that guides the process based on input text and current canvas. Extensive training on the ImageNet dataset demonstrates NVG's superior performance compared to previous models, with clear scaling behavior and improved FID scores.
image-generation ✓
+ visual-granularity
hierarchical-representation ✓
deep-learning ✓
image-quality ✓