What Language Model to Train if You Have One Million GPU Hours? - 42Papers