ELICIT: Learning Human-Specific Neural Fields from a Single Image
One-shot Implicit Animatable Avatars with Model-based Priors
To enable the data-efficient creation of realistic animatable 3d humans, we propose a novel method for learning human-specific neural radiance fields from a single image.
Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a singleimage, we leverage two priors in our method : 3d geometry prior and visual semantic prior.
Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas.
Comprehensive evaluations on multiple popular benchmarks, including zju-mocap, human3.6m, and deepfashion, show that our method has outperformed current state-of-the-art avatar creation methods when only a single image is available.
Authors
Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao Wang, Binbin Lin, Debing Zhang, Deng Cai
We present a novel approach, called elicit, for training an animatable neural radiance field from a single image.
We explicitly utilize body shape geometry and visual clothing semantic priors to guide the optimization and achieve free-view rendering from single image inputs.
Our method enables the creation of animatable avatars that can be rendered from arbitrary views and poses.
We conduct quantitative and qualitative comparisons with previous state-of-the-art human-specific neural rendering methods in the single image input setting.
Result
We introduce a novel method to construct an animatable implicit representation from a single image input and generate a free-view video of the character in the target motion.
Two model-based priors drive the one-shot optimization of the method : the visual-model-based visual semantic prior and the human body prior, which enables the reconstruction of body geometry and the inference of full body clothing.
We demonstrate superior performance in single-image settings compared to prior work on novel view and novel pose synthesis, and strong generalizability on real-world human images.