Text-Guided 3D Diffusion Models
3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
Text-guided diffusion models have shown superior performance in image/video generation and editing.While few explorations have been performed in 3dscenarios, we discuss three fundamental and interesting problems on this topic.First, we equip text-guided models to achieve \textbf{3d-consistent generation}.Specifically, we integrate a neural field to generate low-resolution coarse results for a given camera view.Such results can provide 3d priors as condition information for the following diffusion process.During denoising diffusion, we further enhance the 3dconsistency by modeling cross-view correspondences with a novel two-stream(corresponding to two different views) asynchronous diffusion process.Second, we propose a two-step solution that can generate manipulated results by editing an object from a single view.Step 1, we propose to perform 2d local editing by blending the predicted noises.Step 2, we conduct a noise-to-text inversion process that maps 2dblended noises into the view-independent text embedding space.Once the corresponding text embedding is obtained, images can be generated.Last but not least, we extend our model to perform one-shot view synthesis by fine-tuning on a single image, firstly showing the potential of leveraging text guidance for novel view synthesis.