Denoising Probabilistic Models for 3D Trajectory Field Synthesis
DiffRF: Rendering-Guided 3D Radiance Field Diffusion
We introduce a novel approach for volumetric 3d radiance field synthesis based on denoising diffusion probabilistic models.
While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields from an explicit voxel gridrepresentation.
We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts.
In contrast to 2d-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation.
Authors
Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Matthias Nießner
Diffusion-based models have recently taken the computer vision research community by storm, performing on-par or even surpassing generative adversarial networks on multiple 2d benchmarks, and are producing photo-realistic images that are almost indistinguishable from real photographs.
For multi-modal or conditional settings such as text-to-image synthesis, we currently observe unprecedented output quality and diversity from diffusion-based approaches.
However, lifting the denoising-diffusion formulation directly to 3d volumetric radiance fields remains challenging.
The main reason lies in the nature of diffusion models, which require a one-to-one mapping between the noise vector and the corresponding ground truth data samples.
In this work, we present the first diffusion-based generative model that directly synthesizes 3d radiance fields, thus unlocking high-quality 3d asset generation for both shape and appearance.
Result
We introduce a novel approach for 3d radiance field synthesis based on denoising diffusion probabilistic models.
Our model learns multi-view consistent priors from collections of posed images, enabling free-view image synthesis and accurate shape generation.
We evaluate our model on several object classes, comparing its performance against state-of-the-art gan-based approaches, and demonstrating its effectiveness in both conditional and unconditional 3d generation tasks.