On Distillation of Guided Diffusion Models
Classifier-free guided diffusion models have recently been shown to be highly
effective at high-resolution image generation, and they have been widely used
in large-scale diffusion frameworks including DALL-E 2, GLIDE and Imagen.
However, a downside of classifier-free guided diffusion models is that they are
computationally expensive at inference time since they require evaluating two
diffusion models, a class-conditional model and an unconditional model,
hundreds of times. To deal with this limitation, we propose an approach to
distilling classifier-free guided diffusion models into models that are fast to
sample from: Given a pre-trained classifier-free guided model, we first learn a
single model to match the output of the combined conditional and unconditional
models, and then progressively distill that model to a diffusion model that
requires much fewer sampling steps. On ImageNet 64x64 and CIFAR-10, our
approach is able to generate images visually comparable to that of the original
model using as few as 4 sampling steps, achieving FID/IS scores comparable to
that of the original model while being up to 256 times faster to sample from.