Object compositing based on 2D images is a challenging problem since it
typically involves multiple processing stages such as color harmonization,
geometry correction and shadow generation to generate realistic results.
Furthermore, annotating training data pairs for compositing requires
substantial manual effort from professionals, and is hardly scalable. Thus,
with the recent advances in generative models, in this work, we propose a
self-supervised framework for object compositing by leveraging the power of
conditional diffusion models. Our framework can hollistically address the
object compositing task in a unified model, transforming the viewpoint,
geometry, color and shadow of the generated object while requiring no manual
labeling. To preserve the input object's characteristics, we introduce a
content adaptor that helps to maintain categorical semantics and object
appearance. A data augmentation method is further adopted to improve the
fidelity of the generator. Our method outperforms relevant baselines in both
realism and faithfulness of the synthesized result images in a user study on
various real-world images.
Authors
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga