1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed - 42Papers