Adam-based Communication Compression for Large BERT and GPT-3 Models - 42Papers