**This is an old revision of the document!**

# Model Disentanglement

**Intent**

Train a Model to have disentangled parameters that leads to better generalization.

**Motivation**

How can we improve the generalization of our models?

**Structure**

<Diagram>

**Discussion**

Disentanglement, we don't want to eliminate anything, but rather separate different pieces. Which as a consequences, reduces the dimensionality of the problem by including only those features that are sensitive to the factor. This is related to matrix factorization where we want to factor out properties of our model.

Why can unsupervised learning give rise to features that are less disentangled than the original data?

Unsupervised learning works when anti-causality exists. There is a deep connection between the direction of causality and the ability of unsupervised learning to aid in supervised learning.

A network is an under-constrained system that will have many Models can perform equivalently well. How can we construct a network that will learn Models that have less entanglement?

Does entanglement lead to better generalization? Are disentangled Models less trainable? Are disentangled Models less expressive?

[two ways to create disentanglement, via an autoencoder and via a variational autoencoder ]

**Known Uses**

**Related Patterns**

<Diagram>

*Relationship to Canonical Patterns*

- Structured Factorization (may need to remove this pattern)

Autoencoder

**References**

https://arxiv.org/pdf/1606.05579.pdf Early Visual Concept Learning with Unsupervised Deep Learning

deep unsupervised generative models are capable of learning disentangled representations of the visual data generative factors if put under similar learning constraints as those present in the ventral visual pathway in the brain: 1) the observed data is generated by underlying factors that are densely sampled from their respective continuous distributions; and 2) the model is encouraged to perform redundancy reduction and to pay attention to statistical independencies in the observed data. The application of such pressures to an unsupervised generative model leads to the familiar VAE formulation [19, 26] with a temperature coefficient β that regulates the strength of such pressures and, as a consequence, the qualitative nature of the representations learnt by the model.

We have shown that learning disentangled representations leads to useful emergent properties. The ability of trained VAEs to reason about new unseen objects suggests that they have learnt from raw pixels and in a completely unsupervised manner basic visual concepts, such as the “objectness” property of the world.

In order to learn disentangled representations of the generative factors we introduce a constraint that encourages the distribution over latent factors z to be close to a prior that embodies the neuroscience inspired pressures of redundancy reduction and independence prior. This results in a constrained optimisation problem shown in Eq. 2, where specifies the strength of the applied constraint.

max φ,θ Eqφ(z|x) [log pθ(x|z)] subject to DKL(qφ(z|x)||p(z)) < . (2)

Writing Eq. 2 as a Lagrangian we obtain the familiar variational free energy objective function shown in Eq. 3 [19, 26], where β > 0 is the inverse temperature or regularisation coefficient.

L(θ, φ; x) = Eqφ(z|x) [log pθ(x|z)] − β DKL(qφ(z|x)||p(z)) (3)

Varying β changes the degree of applied learning pressure during training, thus encouraging different learnt representations. When β = 0, we obtain the standard maximum likelihood learning. When β = 1, we recover the Bayes solution. We postulate that in order to learn disentangled representations of the continuous data generative factors it is important to tune β to approximate the level of learning pressure present in the ventral visual stream.

https://arxiv.org/abs/1606.03657

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm.

https://arxiv.org/abs/1503.03167 Deep Convolutional Inverse Graphics Network This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that learns an interpretable representation of images. This representation is disentangled with respect to transformations such as out-of-plane rotations and lighting variations.

http://arxiv.org/abs/1606.08660v2 Theory reconstruction: a representation learning view on predicate invention

http://arxiv.org/pdf/1602.07349v2.pdf Parsimonious Modelling with Information Filtering Networks

http://arxiv.org/abs/1604.08772 Towards Conceptual Compression

We show that it naturally separates global conceptual information from lower level details, thus addressing one of the fundamentally desired properties of unsupervised learning. Furthermore, the possibility of restricting ourselves to storing only global information about an image allows us to achieve high quality 'conceptual compression'.

In this paper, we introduced convolutional DRAW, a stateof- the-art generative model which demonstrates the potential of sequential computation and recurrent neural networks in scaling up latent variable models. During inference, the algorithm sequentially arrives at a natural stratification of information, ranging from global aspects to lowlevel details. An interesting feature of the method is that, when we restrict ourselves to storing just the high level latent variables, we arrive at a ‘conceptual compression’ algorithm that rivals the quality of JPEG2000. As a generative model, it outperforms earlier latent variable models on both the Omniglot and ImageNet datasets.

http://arxiv.org/pdf/1509.08731v1.pdf Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

http://statweb.stanford.edu/~tibs/ftp/tibs-copss.pdf In praise of sparsity and convexity

Hastie et al. (2001) coined the informal “Bet on Sparsity” principle. The l1 methods assume that the truth is sparse, in some basis. If the assumption holds true, then the parameters can be efficiently estimated using l1 penalties. If the assumption does not hold—so that the truth is dense—then no method will be able to recover the underlying model without a large amount of data per parameter.

http://arxiv.org/pdf/1607.00485v1.pdf Group Sparse Regularization for Deep Neural Networks n this paper, we consider the joint task of simultaneously optimizing (i) the weights of a deep neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are generally dealt with separately, we present a simple regularized formulation allowing to solve all three of them in parallel, using standard optimization routines. Specifically, we extend the group Lasso penalty (originated in the linear regression literature) in order to impose group-level sparsity on the network's connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We perform an extensive experimental evaluation, by comparing with classical weight decay and Lasso penalties. We show that a sparse version of the group Lasso penalty is able to achieve competitive performances, while at the same time resulting in extremely compact networks with a smaller number of input features.

http://arxiv.org/pdf/1602.02383v1.pdf Disentangled Representations in Neural Models Describes a model which is able to factorize a multitask learning problem into subtasks and which experiences no catastrophic forgetting.

http://arxiv.org/abs/1606.03490v2 The Mythos of Model Interpretability

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable Is Artificial Intelligence Permanently Inscrutable?

http://openreview.net/pdf?id=SkC_7v5gx THE POWER OF SPARSITY IN CONVOLUTIONAL NEURAL NETWORKS

A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In many cases this leads to significantly smaller networks with only minimal changes to accuracy. In this paper, we take a step further by empirically examining a strategy for deactivating connections between filters in convolutional layers in a way that allows us to harvest savings both in run-time and memory for many network architectures.

https://arxiv.org/abs/1611.03383v1 Disentangling factors of variation in deep representations using adversarial training

We address the problem of disentanglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies.

https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?

This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more.

Our main findings include: 1. Some features are learned reliably in multiple networks, yet other features are not consistently learned. 2. Units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not. 3. The representation codes are a mix between a local (single unit) code and slightly, but not fully, distributed codes across multiple units. 4. The average activation values of neurons vary considerably within a network, yet the mean activation values across different networks converge to an almost identical distribution.

http://arxiv.org/abs/1607.03738v1 Do semantic parts emerge in Convolutional Neural Networks?

We have analyzed the emergence of semantic parts in CNNs. We have investigated whether the network’s filters learn to respond to semantic parts. We have associated filter stimuli with ground-truth part bounding-boxes in order to perform a quantitative evaluation for different layers, network architectures and supervision levels. Despite promoting this emergence by providing favorable settings and multiple assists, we found that only 34 out of 123 semantic parts in PASCAL-Part dataset [5] emerge in AlexNet [6] fine tuned for object detection [7].

One reason why it's so difficult to assign a meaning to individual neurons is that traditional approaches like the LSTM allow every neuron's activation for one word in a sentence to depend on every other neuron's activation for the previous word. So the activations of all the neurons mix together with each other, and it's unlikely for any one neuron to have a single well-defined meaning.

This new QRNN approach may help interpretability of neurons, because each neuron's activation doesn't depend at all on the past history of any other neurons. This means that neurons are more likely, although not guaranteed, to have independent and well-defined meanings, and these meanings are more likely to be simpler and more human-interpretable.

http://www.computervisionblog.com/2016/06/making-deep-networks-probabilistic-via.html

https://arxiv.org/abs/1612.07843 “What is Relevant in a Text Document?”: An Interpretable Machine Learning Approach

In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information.

https://arxiv.org/abs/1612.04440 Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

Our experimental results demonstrate our model's success in factoring its representation, and demonstrate that the model achieves improved performance in transfer learning tasks.

In this paper we have presented a neural network model that learns to decompose the static and temporally varying semantic information in video. We have demonstrated the success of this model in factoring its representation both quantitatively and qualitatively.

https://arxiv.org/pdf/1606.05579v3.pdf Early Visual Concept Learning with Unsupervised Deep Learning

Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors.

https://openreview.net/pdf?id=B1ElR4cgg ADVERSARIALLY LEARNED INFERENCE

We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an adversarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks and a discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network. We illustrate the ability of the model to learn mutually coherent inference and generation networks through the inspections of model samples and reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on the semi-supervised SVHN and CIFAR10 tasks.

https://arxiv.org/abs/1702.08608v1 A Roadmap for a Rigorous Science of Interpretability

we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

https://arxiv.org/pdf/1602.02383v1.pdf Disentangled Representations in Neural Models

https://arxiv.org/abs/1602.02658 Graying the black box: Understanding DQNs

In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abstractions directly from features and may be used as a sub-goal detector in future work. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.

https://arxiv.org/abs/1705.05598 PatternNet and PatternLRP – Improving the interpretability of neural networks

https://openreview.net/forum?id=Sy2fzU9gl beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

We introduce beta-VAE, a new state-of-the-art framework for automated discovery of interpretable factorised latent representations from raw image data in a completely unsupervised manner. Our approach is a modification of the variational autoencoder (VAE) framework. We introduce an adjustable hyperparameter beta that balances latent channel capacity and independence constraints with reconstruction accuracy. We demonstrate that beta-VAE with appropriately tuned beta > 1 qualitatively outperforms VAE (beta = 1), as well as state of the art unsupervised (InfoGAN) and semi-supervised (DC-IGN) approaches to disentangled factor learning on a variety of datasets (celebA, faces and chairs). Furthermore, we devise a protocol to quantitatively compare the degree of disentanglement learnt by different models, and show that our approach also significantly outperforms all baselines quantitatively. Unlike InfoGAN, beta-VAE is stable to train, makes few assumptions about the data and relies on tuning a single hyperparameter, which can be directly optimised through a hyper parameter search using weakly labelled data or through heuristic visual inspection for purely unsupervised data.

https://arxiv.org/abs/1707.03634v1 Speaker-independent Speech Separation with Deep Attractor Network

We propose a novel deep learning framework for speech separation that addresses both of these important issues. We use a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space. A reference point (attractor) is created in the embedding space to pull together all the time-frequency bins that belong to that speaker. The attractor point for a speaker is formed by finding the centroid of the source in the embedding space which is then used to determine the source assignment. We propose three methods for finding the attractor points for each source, including unsupervised clustering, fixed attractor points, and fixed anchor points in the embedding space that guide the estimation of attractor points. The objective function for the network is standard signal reconstruction error which enables end-to-end operation during both the training and test phases. We evaluate our system on the Wall Street Journal dataset (WSJ0) on two and three speaker mixtures, and report comparable or better performance in comparison with other deep learning methods for speech separation.

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

https://arxiv.org/pdf/1703.04730.pdf Understanding Black-box Predictions via Influence Functions

How can we explain the predictions of a blackbox model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually indistinguishable training-set attacks. https://worksheets.codalab.org/worksheets/0x2b314dc3536b482dbba02783a24719fd/

https://arxiv.org/pdf/1707.08475v1.pdf DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA's vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts - even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC).

https://arxiv.org/abs/1711.00889 Structured Generative Adversarial Networks