MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
In this work, we propose an ID-preserving talking head generation framework,
which advances previous methods in two aspects. First, as opposed to
interpolating from sparse flow, we claim that dense landmarks are crucial to
achieving accurate geometry-aware flow fields. Second, inspired by
face-swapping methods, we adaptively fuse the source identity during synthesis,
so that the network better preserves the key characteristics of the image
portrait. Although the proposed model surpasses prior generation fidelity on
established benchmarks, to further make the talking head generation qualified
for real usage, personalized fine-tuning is usually needed. However, this
process is rather computationally demanding that is unaffordable to standard
users. To solve this, we propose a fast adaptation model using a meta-learning
approach. The learned model can be adapted to a high-quality personalized model
as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement
module is proposed to improve the fine details while ensuring temporal
coherency. Extensive experiments prove the significant superiority of our
approach over the state of the arts in both one-shot and personalized settings.
Authors
Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen