Personalization of Text-to-image Diffusion Models
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
We present a new approach for"personalization"of text-to-image diffusion models (specializing them to users'needs).Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (imagen) such that it learns to bind a unique identifier with that specific subject.Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes.By leveraging the semantic prior embeddedin the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images.