1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed - 42Papers