Are Pre-trained Convolutions Better than Pre-trained Transformers? - 42Papers