Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mutual_information [2018/01/15 02:40]
admin
mutual_information [2018/10/02 10:19]
admin
Line 351: Line 351:
  
 We argue that the estimation of the mutual information between high dimensional continuous random variables is achievable by gradient descent over neural networks. This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings. We argue that the estimation of the mutual information between high dimensional continuous random variables is achievable by gradient descent over neural networks. This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings.
 +
 +https://​arxiv.org/​pdf/​1803.05897v1.pdf Contrasting information theoretic decompositions of
 +modulatory and arithmetic interactions in neural information
 +processing systems
 +
 +The decompositions that we report here show that contextual modulation
 +has information processing properties that contrast with those of all four simple arithmetic
 +operators, that it can take various forms, and that the form used in our previous studies of artificial
 +nets composed of local processors with both driving and contextual inputs is particularly
 +well-suited to provide the distinctive capabilities of contextual modulation under a wide range
 +of conditions. We argue that the decompositions reported here could be compared with those
 +obtained from empirical neurobiological and psychophysical data under conditions thought to
 +reflect contextual modulation. That would then shed new light on the underlying processes involved.
 +Finally, we suggest that such decompositions could aid the design of context-sensitive
 +machine learning algorithms
 +
 +https://​arxiv.org/​abs/​1801.09223v2 Probability Mass Exclusions and the Directed Components of Pointwise Mutual Information
 +
 +We start by introducing probability mass diagrams, which provide a visual representation of how a prior distribution is transformed to a posterior distribution through exclusions. With the aid of these diagrams, we identify two distinct types of probability mass exclusions---namely informative and misinformative exclusions.
 +
 +https://​arxiv.org/​abs/​1805.04928v1 Doing the impossible: Why neural networks can be trained at all
 +
 +In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights.
 +
 +https://​arxiv.org/​abs/​1805.07249 Dynamic learning rate using Mutual Information
 +
 +Two approaches are demonstrated - tracking relative change in mutual information and, additionally tracking its value relative to a reference measure.
 +
 +https://​arxiv.org/​abs/​1808.06670v1 Learning deep representations by mutual information estimation and maximization
 +
 +Our method, which we call Deep INFOMAX (DIM), can be used to learn representations with desired characteristics and which empirically outperform a number of popular unsupervised learning methods on classification tasks. DIM opens new avenues for unsupervised learn-ing of representations and is an important step towards flexible formulations of representation learning objectives catered towards specific end-goals.
 +
 +https://​xbpeng.github.io/​projects/​VDB/​index.html Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow