InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
In this paper, we take a significant step towards real-world applicability of
monocular neural avatar reconstruction by contributing InstantAvatar, a system
that can reconstruct human avatars from a monocular video within seconds, and
these avatars can be animated and rendered at an interactive rate. To achieve
this efficiency we propose a carefully designed and engineered system, that
leverages emerging acceleration structures for neural fields, in combination
with an efficient empty space-skipping strategy for dynamic scenes. We also
contribute an efficient implementation that we will make available for research
purposes. Compared to existing methods, InstantAvatar converges 130x faster and
can be trained in minutes instead of hours. It achieves comparable or even
better reconstruction quality and novel pose synthesis results. When given the
same time budget, our method significantly outperforms SoTA methods.
InstantAvatar can yield acceptable visual quality in as little as 10 seconds
training time.