In this paper, we make the first attempt to introduce the explicit 3d shape prior to clip-guided 3d optimization methods.
Specifically, we first generate a high-quality 3d shape from input texts in the text-to-shape stage as the 3d shape prior and then utilize it as the initialization of a neural radiance field and then optimize it with the full prompt.
We show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models to efficiently shift into new domains indicated by text prompts without access to groundtruth samples from target domains.
The proposed method is the first attempt at incorporating large-scale pre-trained diffusion models and distillation sampling for text-driven image generator domain adaptation and gives a quality previously beyond possible.
Text-to-image synthesis has been a revolutionary breakthrough in the evolution of generative artificial intelligence (generative ai), allowing us to synthesize diverse images that convey highly complex visual concepts.
However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content.