A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
The rapid recent progress in machine learning (ML) has raised a number of
scientific questions that challenge the longstanding dogma of the field. One of
the most important riddles is the good empirical generalization of
overparameterized models. Overparameterized models are excessively complex with
respect to the size of the training dataset, which results in them perfectly
fitting (i.e., interpolating) the training data, which is usually noisy. Such
interpolation of noisy data is traditionally associated with detrimental
overfitting, and yet a wide range of interpolating models -- from simple linear
models to deep neural networks -- have recently been observed to generalize
extremely well on fresh test data. Indeed, the recently discovered double
descent phenomenon has revealed that highly overparameterized models often
improve over the best underparameterized model in test performance.
Understanding learning in this overparameterized regime requires new theory
and foundational empirical studies, even for the simplest case of the linear
model. The underpinnings of this understanding have been laid in very recent
analyses of overparameterized linear regression and related statistical
learning tasks, which resulted in precise analytic characterizations of double
descent. This paper provides a succinct overview of this emerging theory of
overparameterized ML (henceforth abbreviated as TOPML) that explains these
recent findings through a statistical signal processing perspective. We
emphasize the unique aspects that define the TOPML research area as a subfield
of modern ML theory and outline interesting open questions that remain.