Beyond neural scaling laws: beating power law scaling via data pruning - 42Papers