Sparse Updates for Gradient-Based Deep Neural Network Training
Training Neural Networks with Fixed Sparse Masks
We show that it is possible to induce a fixed sparse mask on the model s parameters that selects a subset to update over many iterations.
Our method constructs the mask out of the parameters with the largest fisher information as a simple approximation as to which parameters are most importantfor the task at hand.
Experiments on parameter-efficient transfer learning and distributed training show that our approach matches or exceeds the performance of other methods for training with sparse updates while being moreefficient in terms of memory usage and communication costs.