Shape-Guided Diffusion with Inside-Outside Attention
Shape can specify key object constraints, yet existing text-to-image
diffusion models ignore this cue and synthesize objects that are incorrectly
scaled, cut off, or replaced with background content. We propose a
training-free method, Shape-Guided Diffusion, which uses a novel Inside-Outside
Attention mechanism to constrain the cross-attention (and self-attention) maps
such that prompt tokens (and pixels) referring to the inside of the shape
cannot attend outside the shape, and vice versa. To demonstrate the efficacy of
our method, we propose a new image editing task where the model must replace an
object specified by its mask and a text prompt. We curate a new ShapePrompts
benchmark based on MS-COCO and achieve SOTA results in shape faithfulness, text
alignment, and realism according to both quantitative metrics and human
preferences. Our data and code will be made available at
this https URL
Authors
Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell