Differences

This shows you the differences between two versions of the page.

Link to this comparison view

collective_learning [2017/08/09 01:13]
collective_learning [2017/08/09 01:13] (current)
Line 1: Line 1:
 +https://​docs.google.com/​a/​codeaudit.com/​document/​d/​1ZwFwco9OKjiHFPe51UUoWt0S6nvleUTr3LNZpRFux6w/​edit?​usp=sharing
 +
 +====== Collective Learning Patterns======
 +
 +Collective Learning describes patterns that employs more than one deep learning network to achieve a goal.   
 +
 +Neural Networks are equivalent to stateless functions. ​ The training process of course is not stateless, ​ the states are the model representation that is being learned. ​ However when deployed, ​ the model remains static and a Neural Network is equivalent to a stateless function.
 +
 +Computer algorithms have 3 fundamental constructs; namely assignment, selection and iteration. Neural networks traditional have the former two, however when you add iteration into the mix then it takes the cognitive ability of these networks to a whole new level. ​ This chapter covers patterns for using neural networks in combination with iterative algorithms.
 +
 +Note that Fitness and Similarity are relevant in this context. ​ Interesting to not about evolution, inheritance and fitness.
 +
 +{{http://​main-alluviate.rhcloud.com/​wp-content/​uploads/​2016/​06/​compositelearning.png}}
 +
 +[[Adversarial Training]] (Dueling Networks)
 +
 +[[AutoEncoder]]
 +
 +<​del>​[[Generative Model]]</​del>​ (moved to Explanatory)
 +
 +[[Multimodal Alignment]]
 +
 +<​del>​[[Attention]]</​del>​ Moved to Memory Patterns
 +
 +[[Beam Search]] Should be moved to learning
 +
 +[[Deep Clustering]] Should be moved to learning
 +
 +[[Recommender CNN]]
 +
 +[[Value Policy RL]] Reinforcement Learning (Value and Policy Nets)
 +
 +
 +[[Combinatorial Feature Selection]]
 +
 +
 +[[Hyper-Parameter Tuning]] - Should be moved to Learning
 +
 +
 +
 +[[Monte Carlo Tree Search]]
 +
 +[[Graph Based Semi-Supervised Learning]]
 +
 +<​del>​[[Meta-Learning]]</​del>​
 +
 +[[One-Shot Learning]] (Semi-supervised Learning) - Should be moved to Memory Patterns
 +
 +[[Imitation Learning]] - Not sure if this is a pattern.
 +
 +[[Progressive Network]]
 +
 +[[Similarity Network]]
 +
 +[[Encoder Decoder]]
 +
 +[[Sequence to Sequence]]
 +
 +[[Wide and Deep Learning]] (Cooperative Regularization)
 +
 +[[In_Layer_Regularization]] ​ Layerwise Regularization,​ Hidden Layer Regularization
 +
 +[[Probabilistic Graph]]
 +
 +[[Game Theoretic Learning]]
 +
 +[[Stability Training]]
 +
 +[[High Level Controller]]
 +
 +[[Asynchronous Training]]
 +
 +[[Multi-Objective Learning]]
 +
 +[[Recurrent Reinforcement Learning]]
 +
 +[[Learning to Communicate]]
 +
 +[[Triplet Learning]]
 +
 +[[Meta-Control]]
 +
 +[[Relational Reasoning]]
 +
 +[[Operational Motivation]]
 +
 +**References**
 +
 +http://​biorxiv.org/​content/​biorxiv/​early/​2016/​06/​13/​058545.full.pdf  ​
 +Towards an integration of deep learning and neuroscience
 +
 +We hypothesize that (1) the brain optimizes
 +cost functions, (2) these cost functions are diverse and differ across brain locations and over
 +development,​ and (3) optimization operates within a pre-structured architecture matched to the
 +computational problems posed by behavior. Such a heterogeneously optimized system, enabled by
 +a series of interacting cost functions, serves to make learning data-efficient and precisely targeted
 +to the needs of the organism.
 +
 +https://​arxiv.org/​abs/​1708.02556 Multi-Generator Gernerative Adversarial Nets
 +
 +The training procedure is formulated as a minimax game among many generators, a classifier, and a discriminator. Generators produce data to fool the discriminator while staying within the decision boundary defined by the classifier as much as possible; classifier estimates the probability that a sample came from each of the generators; and discriminator estimates the probability that a sample came from the training data rather than from all generators. We develop theoretical analysis to show that at equilibrium of this system, the Jensen-Shannon divergence between the equally weighted mixture of all generators'​ distributions and the real data distribution is minimal while the Jensen-Shannon divergence among generators'​ distributions is maximal. Generators can be trained efficiently by utilizing parameter sharing, thus adding minimal cost to the basic GAN model. We conduct extensive experiments on synthetic and real-world large scale data sets (CIFAR-10 and STL-10) to evaluate the effectiveness of our proposed method. Experimental results demonstrate the superior performance of our approach in generating diverse and visually appealing samples over the latest state-of-the-art GAN's variants.