We propose a variant of the epoch stochastic gradient gradient ascent algorithm (esgda) with a simpler theoretical analysis.
The proposed algorithm, known as randomized epoch stochastic gradient gradient ascent (rsgda), carries a loop of stochastic gradient ascent (sga) steps on the (inner) maximization problem, followed by an sgd stepon the (outer) minimization.
With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze
This paper studies almost-sure convergence rates of the stochastic gradient descent method when instead of deterministic, its learning rate becomes stochastic.
In particular, its learning rate is equipped with a multiplicative stochasticity, producing a stochastic learning rate scheme.
We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a
principled acceleration of Federated Averaging (FedAvg, also known as Local
SGD) for distributed optimization. FedAc is the firs
We study to what extent may stochastic gradient descent (SGD) be understood
as a "conventional" learning rule that achieves generalization performance by
obtaining a good fit to training data. We cons
Stochastic gradient descent is a simple approach to find the local minima of
a cost function whose evaluations are corrupted by noise. In this paper, we
develop a procedure extending stochastic gradie
Establishing a fast rate of convergence for optimization methods is crucial
to their applicability in practice. With the increasing popularity of deep
learning over the past decade, stochastic gradien
We present an algorithm to minimize its energy function, known as stress, by using stochastic gradient descent(sgd)to move a single pair of vertices at a time.
Our results show that sgd can reach lower stress levels faster and more consistently than majorization, without needing help from a good initialization.
We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions usedin practice, including relu.
We improve the existing state-of-the-art resultsin terms of the required hidden layer width.
We fit single-hidden-layer neural networks to data generated by single-hidden-layer relu teacher networks with parameters drawnfrom a natural distribution.
We demonstrate that stochastic gradient descent with automated width selection attains small expected error with a number of samples and total number of queries both nearly linear in the input dimension and width.
In this paper, we first conduct a systematic empirical study on existing datashuffling strategies, which reveals that all existing strategies have room for improvement.
With this in mind
, we propose a simple but novel hierarchical data shuffling strategy, corgipile.
Stochastic gradient descent is an optimisation method that combines classical
gradient descent with random subsampling within the target functional. In this
work, we introduce the stochastic gradient
Multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rateschemes.
In-expectation theoretical convergence results of stochastic gradientdescent equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the onlineoptimization settings are provided.
We analyze the behavior of projected stochastic gradient descent focusing on
the case where the optimum is on the boundary of the constraint set and the
gradient does not vanish at the optimum. Here i
Stochastic Gradient Descent with a constant learning rate (constant SGD)
simulates a Markov chain with a stationary distribution. With this perspective,
we derive several new results. (1) We show that
Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning
technology. At each step of the training phase, a mini batch of samples is
drawn from the training dataset and the weights
The notion of the stationary equilibrium ensemble has played a central role
in statistical mechanics. In machine learning as well, training serves as
generalized equilibration that drives the probabil
We consider a decentralized learning setting in which data is distributed
over nodes in a graph. The goal is to learn a global model on the distributed
data without involving any central entity that n
We investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant.
Under smoothness and of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformlybounded iterates and function values.