Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
stochastic_gradient_descent [2018/05/07 01:29]
admin
stochastic_gradient_descent [2018/11/22 10:46]
admin
Line 271: Line 271:
  
 https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs
 +
 +http://​mdolab.engin.umich.edu/​sites/​default/​files/​Martins2003CSD.pdf The Complex-Step Derivative Approximation
 +
 +https://​arxiv.org/​abs/​1810.00150 Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
 +
 +We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity,​ suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.
 +
 +https://​arxiv.org/​abs/​1810.02054 Gradient Descent Provably Optimizes Over-parameterized Neural Networks
 +
 +over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​
 +
 +https://​arxiv.org/​abs/​1810.11393 Dendritic cortical microcircuits approximate the backpropagation algorithm
 +
 +https://​arxiv.org/​abs/​1811.03962 A Convergence Theory for Deep Learning via Over-Parameterization