In this paper, we present a two-stage optimization framework for text-to-3d synthesis.
First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3dhash grid structure.
Together with the image-conditioned generation capabilities, we provide users with new ways to control 3d synthesis, opening up new avenues to various creative applications.
Authors
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
Creating professional 3d content is not for anyone as it requires immense artistic and aesthetic training with 3d modeling expertise.
Developing these skill sets takes a significant amount of time and effort.
Augmenting 3d content creation with natural language could considerably help democratize 3d content creation for novices and turbocharge expert artists.
In this paper, we present a method that can synthesize highly detailed 3d models from text prompts within a reduced computation time.
Specifically, we propose a coarse-to-fine optimization approach that uses multiple diffusion priors at different resolutions to optimize the 3d representation, enabling the generation of both view-consistent geometry as well as high-resolution details.
Our approach produces high-fidelity 3d content that can conveniently be imported and visualized in standard graphics software and does so at 2 @xmath3 the speed of dreamfusion.
Result
We propose, a fast and high-quality text-to-3d generation framework.
It takes 40 minutes from a text prompt to a high-quality 3d mesh model ready to be used in graphic engines.
We benefit from both efficient scene models and high-resolution diffusion priors in a coarse-to-fine approach.
In particular, the 3d mesh models scale nicely with image resolution and enjoy the benefits of higher resolution supervision brought by the latent diffusion model without sacrificing its speed.
With extensive user studies and qualitative comparisons, we show that is more preferable (61.7%) by the raters compared to dreamfusion, while enjoying a @xmath0 speed-up.
Finally, we propose a set of tools for better controlling style and content in 3d generation.