Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
stochastic_gradient_descent [2018/05/18 10:31]
admin
stochastic_gradient_descent [2018/10/31 18:19]
admin
Line 273: Line 273:
  
 http://​mdolab.engin.umich.edu/​sites/​default/​files/​Martins2003CSD.pdf The Complex-Step Derivative Approximation http://​mdolab.engin.umich.edu/​sites/​default/​files/​Martins2003CSD.pdf The Complex-Step Derivative Approximation
 +
 +https://​arxiv.org/​abs/​1810.00150 Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
 +
 +We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity,​ suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.
 +
 +https://​arxiv.org/​abs/​1810.02054 Gradient Descent Provably Optimizes Over-parameterized Neural Networks
 +
 +over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​
 +
 +https://​arxiv.org/​abs/​1810.11393 Dendritic cortical microcircuits approximate the backpropagation algorithm
 +