We present a new implicit warping framework for image animation using sets of
source images through the transfer of the motion of a driving video. A single
cross- modal attention layer is used to find correspondences between the source
images and the driving image, choose the most appropriate features from
different source images, and warp the selected features. This is in contrast to
the existing methods that use explicit flow-based warping, which is designed
for animation using a single source and does not extend well to multiple
sources. The pick-and-choose capability of our framework helps it achieve
state-of-the-art results on multiple datasets for image animation using both
single and multiple source images. The project website is available at
this https URL warping/