Classifier-free guided diffusion models have recently been shown to be highly
effective at high-resolution image generation, and they have been widely used
in large-scale diffusion frameworks including DALL-E 2, GLIDE and Imagen.
However, a downside of classifier-free guided diffusion models is that they are
computationally expensive at inference time since they require evaluating two
diffusion models, a class-conditional model and an unconditional model,
hundreds of times. To deal with this limitation, we propose an approach to
distilling classifier-free guided diffusion models into models that are fast to
sample from: Given a pre-trained classifier-free guided model, we first learn a
single model to match the output of the combined conditional and unconditional
models, and then progressively distill that model to a diffusion model that
requires much fewer sampling steps. On ImageNet 64x64 and CIFAR-10, our
approach is able to generate images visually comparable to that of the original
model using as few as 4 sampling steps, achieving FID/IS scores comparable to
that of the original model while being up to 256 times faster to sample from.
Authors
Chenlin Meng, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans