Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out
Heavy Ball (HB) nowadays is one of the most popular momentum methods in
non-convex optimization. It has been widely observed that incorporating the
Heavy Ball dynamic in gradient-based methods accelerates the training process
of modern machine learning models. However, the progress on establishing its
theoretical foundation of acceleration is apparently far behind its empirical
success. Existing provable acceleration results are of the quadratic or
close-to-quadratic functions, as the current techniques of showing HB's
acceleration are limited to the case when the Hessian is fixed. In this work,
we develop some new techniques that help show acceleration beyond quadratics,
which is achieved by analyzing how the change of the Hessian at two consecutive
time points affects the convergence speed. Based on our technical results, a
class of Polyak-\L{}ojasiewicz (PL) optimization problems for which provable
acceleration can be achieved via HB is identified. Moreover, our analysis
demonstrates a benefit of adaptively setting the momentum parameter.
Authors
Jun-Kun Wang, Chi-Heng Lin, Andre Wibisono, Bin Hu