We propose to change the forward rule of a deep residual neural network (resnet) by adding a momentum term.
The resulting networks, momentum residual neuralnetworks (momentumnets), are invertible and can be used as a drop-in replacement for any existing resnet block.
We show that momentum residual neuralnetworks can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (odes) and exactly characterize how adding momentum progressively increases the representation capabilities of momentumnets.
In a learning to optimize setting, where convergence to a fixed point is required, we show theoretically and empirically that our methodsucceeds while existing invertible architectures fail.
Authors
Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré