Discriminative Self-Supervised Learning on Diverse Set of Images
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Discriminative self-supervised learning allows training models on any random group of internet images and possibly recover salient information that helps differentiate between the images.
In this work, we question if using thisability, we can learn any salient and more representative information presentin diverse unbounded set of images from across the globe.
To do so, we train models on billions of random images without any data pre-processing or priorassumptions about what we want the model to learn.
We scale our model size todense 10 billion parameters to avoid underfitting on a large data size.
The resulting model not only captures well semantic information, it also captures information about artistic style and learns salient information such as geolocations and multilingual word embeddings based on visual content only.
More importantly, we discover that such model is more robust, more fair, less harmful and less biased than supervised models or models trained on objectcentric datasets such as imagenet.
Authors
Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Mannat Singh, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski