This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
stochastic_gradient_descent [2018/05/07 01:29]
stochastic_gradient_descent [2018/10/07 12:16]
Line 272: Line 272:
 https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs
 +http://​mdolab.engin.umich.edu/​sites/​default/​files/​Martins2003CSD.pdf The Complex-Step Derivative Approximation
 +https://​arxiv.org/​abs/​1810.00150 Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
 +We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity,​ suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.
 +https://​arxiv.org/​abs/​1810.02054 Gradient Descent Provably Optimizes Over-parameterized Neural Networks
 +over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​