When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations - 42Papers