Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
This paper presents a 3D generative model that uses diffusion models to
automatically generate 3D digital avatars represented as neural radiance
fields. A significant challenge in generating such avatars is that the memory
and processing costs in 3D are prohibitive for producing the rich details
required for high-quality avatars. To tackle this problem we propose the
roll-out diffusion network (Rodin), which represents a neural radiance field as
multiple 2D feature maps and rolls out these maps into a single 2D feature
plane within which we perform 3D-aware diffusion. The Rodin model brings the
much-needed computational efficiency while preserving the integrity of
diffusion in 3D by using 3D-aware convolution that attends to projected
features in the 2D feature plane according to their original relationship in
3D. We also use latent conditioning to orchestrate the feature generation for
global coherence, leading to high-fidelity avatars and enabling their semantic
editing based on text prompts. Finally, we use hierarchical synthesis to
further enhance details. The 3D avatars generated by our model compare
favorably with those produced by existing generative techniques. We can
generate highly detailed avatars with realistic hairstyles and facial hair like
beards. We also demonstrate 3D avatar generation from image or text as well as
text-guided editability.