Differences
This shows you the differences between two versions of the page.
— |
learning [2017/07/20 11:25] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | https://docs.google.com/a/codeaudit.com/document/d/1caZUtQXSluYRsQppO9lZrJhVHO80xfigXhf8ZxfBgAg/edit?usp=sharing | ||
+ | |||
+ | ====== Learning Patterns====== | ||
+ | |||
+ | This chapter covers mechanisms that are known to lead to a trained model. Why are neural networks able to generalize? Why does back-propagation eventually lead to convergence? There are many questions that still are looking for a good theoretical explanation. However, DL is an experimental science and it is known that the simplistic method of back-propagation is surprisingly effective. | ||
+ | |||
+ | Early objections with regards to neural networks were that the equivalent optimization problem was likely to be convex. What this meant was that it would be extremely difficult to train a model to reach convergence. However recent research disproves this original intuition. Rather, in high-dimensional spaces, it is more likely to find that a local minima is a saddle point and thus the higher probability that gradient descent will eventually find a way to continue to roll down the optimization hill. | ||
+ | |||
+ | The requirements for back-propagation in Deep Learning is surprisingly simplistic. If one is able to calculate the divergence of each of the layers with respect to its model parameters then one can apply it. Back-propagation works extremely well in discovering a convergence basin where a model has learned to generalize. | ||
+ | |||
+ | This chapter covers recurring learning patterns we find in different neural network architectures. At its most abstract form, learning is a credit assignment problem. As a consequence of observed data, which parts of a model do we need to change and by how much? We will explore many of techniques that have been shown to be effective in practice. | ||
+ | |||
+ | {{http://main-alluviate.rhcloud.com/wp-content/uploads/2016/06/learning.png}} | ||
+ | |||
+ | [[Relaxed Backpropagation]] =[[Credit Assignment]] | ||
+ | |||
+ | [[Stochastic Gradient Descent]] | ||
+ | |||
+ | [[Natural Gradient Descent]] | ||
+ | |||
+ | [[Random Orthogonal Initialization]] | ||
+ | |||
+ | [[Transfer Learning]] | ||
+ | |||
+ | |||
+ | |||
+ | [[Curriculum Training]] | ||
+ | |||
+ | [[DropOut]] | ||
+ | |||
+ | [[Domain Adaptation]] | ||
+ | |||
+ | [[Unsupervised Pretraining]] | ||
+ | |||
+ | [[Differential Training]] | ||
+ | |||
+ | [[Genetic Algorithm]] | ||
+ | |||
+ | [[Unsupervised Learning]] | ||
+ | |||
+ | [[Mutable Layer]] | ||
+ | |||
+ | [[Program Induction]] | ||
+ | |||
+ | [[Learning to Optimize]] note: Different from Meta-learning | ||
+ | |||
+ | [[Simulated Annealing]] | ||
+ | |||
+ | [[Meta-Learning]] | ||
+ | |||
+ | [[Continuous Learning]] | ||
+ | |||
+ | [[Feedback Network]] | ||
+ | |||
+ | [[Network Generation]] | ||
+ | |||
+ | [[Learning to Purpose]] | ||
+ | |||
+ | [[Planning to Learn]] | ||
+ | |||
+ | [[Exploration]] | ||
+ | |||
+ | [[Learning to Communicate]] | ||
+ | |||
+ | [[Predictive Learning]] | ||
+ | |||
+ | [[Temporal Learning]] | ||
+ | |||
+ | [[Intrinsic Decomposition]] | ||
+ | |||
+ | [[Herding]] | ||
+ | |||
+ | [[Active Learning]] | ||
+ | |||
+ | [[Primal Dual]] | ||
+ | |||
+ | [[Transport Related]] | ||
+ | |||
+ | [[Structure Evolution]] | ||
+ | |||
+ | [[Self-Supervised Learning]] | ||
+ | |||
+ | [[Knowledge Gradient]] | ||
+ | |||
+ | [[Option Discovery]] | ||
+ | |||
+ | [[Infusion Learning]] | ||
+ | |||
+ | [[Ensemble Reinforcement Learning]] | ||
+ | |||
+ | [[Learning from Demonstration]] | ||
+ | |||
+ | [[Egomotion]] | ||
+ | |||
+ | [[Iterative Teaching]] | ||
+ | |||
+ | [[Reasoning by Analogy]] | ||
+ | |||
+ | **References** | ||
+ | |||
+ | https://arxiv.org/pdf/1606.04838v1.pdf Optimization Methods for Large-Scale Machine Learning | ||
+ | |||
+ | we present a comprehensive | ||
+ | theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, | ||
+ | and highlight opportunities for designing algorithms with improved performance. This leads to | ||
+ | a discussion about the next generation of optimization methods for large-scale machine learning, | ||
+ | including an investigation of two main streams of research on techniques that diminish noise in | ||
+ | the stochastic directions and methods that make use of second-order derivative approximations. | ||
+ | |||
+ | Recent Advances in Non-Convex Optimization | ||
+ | and its Implications to Learning Anima Anandkumar ICML 2016 Tutorial | ||
+ | |||
+ | http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf Simple Statistical Gradient Following for Connectionist Reinforcement Learning | ||