StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Computer graphics has experienced a recent surge of data-centric approaches
for photorealistic and controllable content creation. StyleGAN in particular
sets new standards for generative modeling regarding image quality and
controllability. However, StyleGAN's performance severely degrades on large
unstructured datasets such as ImageNet. StyleGAN was designed for
controllability; hence, prior works suspect its restrictive design to be
unsuitable for diverse datasets. In contrast, we find the main limiting factor
to be the current training strategy. Following the recently introduced
Projected GAN paradigm, we leverage powerful neural network priors and a
progressive growing strategy to successfully train the latest StyleGAN3
generator on ImageNet. Our final model, StyleGAN-XL, sets a new
state-of-the-art on large-scale image synthesis and is the first to generate
images at a resolution of $1024^2$ at such a dataset scale. We demonstrate that
this model can invert and edit images beyond the narrow domain of portraits or
specific object classes.