InstructPix2Pix: Learning to Follow Image Editing Instructions
We propose a method for editing images from human instructions.
Given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
Since it performsedits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds.
To obtain training data for this problem, we combine the knowledge of two large pretrained models to generate a large dataset of image editing examples.
Our conditional diffusionmodel, instructpix2pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time.
We show compelling editing results for a diverse collection of input images and writteninstructions.
We present a method for teaching a generative model to follow human-written instructions for image editing.
Given an input image and a text instruction for how to edit it, our model directly performs the image edit in the forward pass, and does not require any additional example images, full descriptions of the input/output images, or per-example finetuning.
Our model enables intuitive image editing that can follow human instructions to perform a diverse collection of edits: replacing objects, changing the style of an image, changing the setting, the artistic medium, among others.