 https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs https://​arxiv.org/​abs/​1802.10026v2 Loss Surfaces, Mode Connectivity,​ and Fast Ensembling of DNNs
 +http://​mdolab.engin.umich.edu/​sites/​default/​files/​Martins2003CSD.pdf The Complex-Step Derivative Approximation
 +https://​arxiv.org/​abs/​1810.00150 Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
 +We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity,​ suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.
 +https://​arxiv.org/​abs/​1810.02054 Gradient Descent Provably Optimizes Over-parameterized Neural Networks
 +over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​