Learning Sparse Linear Neural Networks with Gradient Descent - 42Papers