Acceleration via Fractal Learning Rate Schedules
Naman Agarwal, Surbhi Goel, Cyril Zhang
When balancing the practical tradeoffs of iterative methods for large-scale
optimization, the learning rate schedule remains notoriously difficult to
understand and expensive to tune. We demonstrate the presence of these
subtleties even in the innocuous case when the objective is a convex quadratic.
We reinterpret an iterative algorithm from the numerical analysis literature as
what we call the Chebyshev learning rate schedule for accelerating vanilla
gradient descent, and show that the problem of mitigating instability leads to
a fractal ordering of step sizes. We provide some experiments and discussion to
challenge current understandings of the "edge of stability" in deep learning:
even in simple settings, provable acceleration can be obtained by making
negative local progress on the objective.