M-VADER: a diffusion model for image generation using combinations of images and text - 42Papers