We propose a variant of the epoch stochastic gradient gradient ascent algorithm (esgda) with a simpler theoretical analysis.
The proposed algorithm, known as randomized epoch stochastic gradient gradient ascent (rsgda), carries a loop of stochastic gradient ascent (sga) steps on the (inner) maximization problem, followed by an sgd stepon the (outer) minimization.
We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions usedin practice, including relu.
We improve the existing state-of-the-art resultsin terms of the required hidden layer width.
We fit single-hidden-layer neural networks to data generated by single-hidden-layer relu teacher networks with parameters drawnfrom a natural distribution.
We demonstrate that stochastic gradient descent with automated width selection attains small expected error with a number of samples and total number of queries both nearly linear in the input dimension and width.
Multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rateschemes.
In-expectation theoretical convergence results of stochastic gradientdescent equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the onlineoptimization settings are provided.
We investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant.
Under smoothness and of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformlybounded iterates and function values.