Scaling Up Vision-Language Pre-training for Image Captioning - 42Papers