A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments
Data-augmentation (da) is known to improve performance across tasks and datasets.
We derive several quantities in close-form, such as the expectation and variance of an image, loss, and model output under a given da distribution.
Those derivations open new avenues to quantify the benefits and limitations of da.
For example, we show that common common data-augmentation require tens of thousands of samples for the loss at hand to be correctlyestimated and for the model training to converge.
We show that for a training loss to be stable under da sampling, the model s saliency map (gradient of the loss with respect to the model s input) must align with the smallest eigenvector of the sample variance under the considered da augmentation, hinting at a possible explanation on why models tend to shift their focus from edges to textures.