Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
stochastic_gradient_descent [2018/10/07 12:16]
admin
stochastic_gradient_descent [2018/10/31 18:19]
admin
Line 281: Line 281:
  
 over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​ over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. ​
 +
 +https://​arxiv.org/​abs/​1810.11393 Dendritic cortical microcircuits approximate the backpropagation algorithm
 +