Training Compute-Optimal Large Language Models - 42Papers