Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors - 42Papers