Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
Denoising diffusion models (ddms) have been actively researched and attracted strong attention due to their capability to generate images with high quality and diversity.
However, the internal self-attention mechanism working inside the denoising diffusion models is under-explored.
Next, we verify the hypotheses about the self-attention map by conducting frequency analysisand testing the relationships with the generated objects.
In consequence, we find out that the attention map is closely related to the quality of generated images and can guide existing pretrained diffusionmodels to generate images with higher fidelity.
In addition to the enhanced sample quality when used alone, we show that the results are further improved by combining our method with classifier guidance on imagenet 128x128.
Authors
Susung Hong, Gyuseong Lee, Wooseok Jang, Seungryong Kim