Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
During image editing, existing deep generative models tend to re-synthesize
the entire output from scratch, including the unedited regions. This leads to a
significant waste of computation, especially for minor editing operations. In
this work, we present Spatially Sparse Inference (SSI), a general-purpose
technique that selectively performs computation for edited regions and
accelerates various generative models, including both conditional GANs and
diffusion models. Our key observation is that users tend to make gradual
changes to the input image. This motivates us to cache and reuse the feature
maps of the original image. Given an edited image, we sparsely apply the
convolutional filters to the edited regions while reusing the cached features
for the unedited regions. Based on our algorithm, we further propose Sparse
Incremental Generative Engine (SIGE) to convert the computation reduction to
latency reduction on off-the-shelf hardware. With 1.2%-area edited regions, our
method reduces the computation of DDIM by 7.5$\times$ and GauGAN by 18$\times$
while preserving the visual fidelity. With SIGE, we accelerate the speed of
DDIM by 3.0x on RTX 3090 and 6.6$\times$ on Apple M1 Pro CPU, and GauGAN by
4.2$\times$ on RTX 3090 and 14$\times$ on Apple M1 Pro CPU.
Authors
Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu