Modern Deep Neural Networks (DNNs) require significant memory to store
weight, activations, and other intermediate tensors during training. Hence,
many models do not fit one GPU device or can be trained using only a small
per-GPU batch size. This survey provides a systematic overview of the
approaches that enable more efficient DNNs training. We analyze techniques that
save memory and make good use of computation and communication resources on
architectures with a single or several GPUs. We summarize the main categories
of strategies and compare strategies within and across categories. Along with
approaches proposed in the literature, we discuss available implementations.
Authors
Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont