Non-convex optimization is ubiquitous in modern machine learning. Researchers
devise non-convex objective functions and optimize them using off-the-shelf
optimizers such as stochastic gradient descent and its variants, which leverage
the local geometry and update iteratively. Even though solving non-convex
functions is NP-hard in the worst case, the optimization quality in practice is
often not an issue -- optimizers are largely believed to find approximate
global minima. Researchers hypothesize a unified explanation for this
intriguing phenomenon: most of the local minima of the practically-used
objectives are approximately global minima. We rigorously formalize it for
concrete instances of machine learning problems.