SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction
We propose a sparse view 3d reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation.
We evaluate our approach across 51 categories in the co3d dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.
We build a computational approach that can similarly predict 3d from just a few images-by integrating visual measurements and priors via probabilistic modeling and then seeking likely 3d modes.
Leveraging a geometrically-informed backbone that computes pixel-aligned features in the query view, our approach learns a (conditional) diffusion model that can then infer detailed plausible novel-view images.
While this probabilistic image synthesis approach allows the generation of higher quality image outputs, it does not directly yield a 3d representation of underlying the object.
In fact, the (independently) sampled outputs for each query view often do not even correspond to a consistent underlying 3d if the nose of the teddybear is unobserved in context views, one sampled query view may paint it red, while another one black.
To obtain a consistent 3d representation, we propose a technique that distills the predicted distributions into an instance-specific 3d representation.