1-bit LAMB with NCCL-based Backend - 42Papers