Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
natural_gradient_descent [2018/02/15 17:34]
admin
natural_gradient_descent [2018/09/01 00:31]
admin
Line 119: Line 119:
  
 To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. ​ To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. ​
 +
 +https://​wiseodd.github.io/​techblog/​2018/​03/​14/​natural-gradient/​
 +
 +https://​arxiv.org/​abs/​1808.10340 A Coordinate-Free Construction of Scalable Natural Gradient
 +
 +We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC applied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.