RedCaps: A Large-scale Dataset of Images and Captions from Reddit
RedCaps: web-curated image-text data created by the people, for the people
We introduce a large-scale dataset of 12 million image-text pairs collected from reddit.
We show that captioning models trained on this dataset produce rich and varied captionspreferred by humans, and learn visual representations that transfer to many downstream tasks.
Authors
Karan Desai, Gaurav Kaul, Zubin Aysola, Justin Johnson