We learn recurrent neural network optimizers trained on simple synthetic
functions by gradient descent. We show that these learned optimizers exhibit a
remarkable degree of transfer in that they can b
The move from hand-designed features to learned features in machine learning
has been wildly successful. In spite of this, optimization algorithms are still
designed by hand. In this paper we show how
There is an increasing interest in emulating Spiking Neural Networks (SNNs)
on neuromorphic computing devices due to their low energy consumption. Recent
advances have allowed training SNNs to a point
In high dimensions, most machine learning method perform fragile even there
are a little outliers. To address this, we hope to introduce a new method with
the base learner, such as Bayesian regression
We introduce a new algorithm for the numerical computation of Nash equilibria
of competitive two-player games. Our method is a natural generalization of
gradient descent to the two-player setting wher
We present a strikingly simple proof that two rules are sufficient to
automate gradient descent: 1) don't increase the stepsize too fast and 2) don't
overstep the local curvature. No need for function
We tackle the problem of group fairness in classification, where the
objective is to learn models that do not unjustly discriminate against
subgroups of the population. Most existing approaches are li
The (stochastic) gradient descent and the multiplicative update method are
probably the most popular algorithms in machine learning. We introduce and
study a new regularization which provides a unific
Existing analyses of optimization in deep learning are either continuous,
focusing on (variants of) gradient flow, or discrete, directly treating
(variants of) gradient descent. Gradient flow is amena
This is a handbook of simple proofs of the convergence of gradient and
stochastic gradient descent type methods. We consider functions that are
Lipschitz, smooth, convex, strongly convex, and/or Polya
With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze
The great success neural networks have achieved is inseparable from the
application of gradient-descent (GD) algorithms. Based on GD, many variant
algorithms have emerged to improve the GD optimizatio
Approximate approximate bayesian inference approaches such as steinvariational gradient descent (svgd) combine the flexibility and convergence guarantees of sampling methods with the computational benefits of variationalinference.
In practice, svgd relies on the choice of an appropriate kernelfunction, which impacts its ability to model the target distribution.
Many economic games and machine learning approaches can be cast as
competitive optimization problems where multiple agents are minimizing their
respective objective function, which depends on all agen
We propose a variant of the epoch stochastic gradient gradient ascent algorithm (esgda) with a simpler theoretical analysis.
The proposed algorithm, known as randomized epoch stochastic gradient gradient ascent (rsgda), carries a loop of stochastic gradient ascent (sga) steps on the (inner) maximization problem, followed by an sgd stepon the (outer) minimization.
Optimization by gradient descent has been one of main drivers of the "deep
learning revolution". Yet, despite some recent progress for extremely wide
networks, it remains an open problem to understand
We describe an alternative to gradient descent for backpropogation through a neural network, which we call Blind Descent. We believe that Blind Descent can be used to augment backpropagation by using
Gradient descent can be surprisingly good at optimizing deep neural networks
without overfitting and without explicit regularization. We find that the
discrete steps of gradient descent implicitly reg
Most of the recent successful applications of neural networks have been based
on training with gradient descent updates. However, for some small networks,
other mirror descent updates learn provably m
We propose an efficient numerical method for computing natural gradient descent directions with respect to a generic metric in the state space in large-scale nonconvex optimization problems.
Our technique relies on representing the natural gradient direction as a solutionto a standard least-squares problem.