**This is an old revision of the document!**

**Name** Generative Models (move to Explanatory patterns)

**Intent**

Train a network to generate examples.

**Motivation**

How can we can generate examples of a classification label?

**Structure**

<Diagram>

**Discussion**

Generative Models is one of the most intriguing aspects of a DL system. A Generative Model is able to generate representative instances of labeled classes. As an example, if a machine is trained to identify dogs, then the system is able to generate many different kinds of dogs. What is happening here is that the machine was constrained in its learning to build a model that is constrained to approximate the intrinsic features of the data. The conjecture is that if a system is able to generate accurate representations using its internal models then it has achieved a generalized understanding.

Generative Models are is thought to be important for unsupervised learning. The motivation is that the ability to synthesize data requires a kind of understanding. It is also hoped that a good generative model will learn a disentangled model. To quote Richard Feynman: “What I cannot create, I do not understand.”

There are several approaches to Generative Models. Generative Adversarial Networks (GANs) involves a training process with dueling networks. This involves a generative network and a discriminative network. The discriminator network attempts to perform a classification against data that the generative network is creating. The generative network attempts to find data that tries to fool the discriminative network and a final consequence a more robust discriminator and generator is formed. GANS perform a kind of Turing test and are currently the best generative model for images. GANs however are difficult to optimize due to unstable adversarial training dynamics. Furthermore, training the generator for generating a high dimensional distribution given only a binary result from the discriminator can be problematic.

Variational Autoencoders (VAE) attempts to learn an internal representation captures the latent variables as gaussian distributions. This constrained network is trained in an unsupervised allow us to formalize this problem in the framework of probabilistic graphical models where we are maximizing a lower bound on the log likelihood of the data. Generated examples using VAEs tend to generated images that are not as sharp as other techniques. VAE tends to reduce the number of dimensions and this reduction tends to compromise the quality of the generated images. Furthermore, VAEs cannot be used when there does not exist a closed form solution for the KL-divergence.

InfoGAN is an extension of GAN that is able to learn disentangled models. GAN are problematic in that it is under constrained and will result in many possible solutions that map the Gaussian to data. This freedom may result in a diffused and highly entangled model. InfoGAN imposes additional constraints that maximizes the mutual information between subsets of the model and the observation. The approach has been shown to be extremely effective.

The commonality of the different approaches towards Generative Models is that the model that is trained has a significantly smaller number of parameters than the data we used to train them on. This constrains the models such that training is forced to find more efficient Models. This is a form of Regularization by Training.

**Known Uses**

http://arxiv.org/pdf/1607.06025v1.pdf Constructing a Natural Language Inference Dataset using Generative Neural Networks

https://arxiv.org/abs/1605.05396 Generative Adversarial Text to Image Synthesis

**Related Patterns**

<Diagram>

**References**

http://www.deeplearningbook.org/contents/generative_models.html The variational autoencoder or VAE (Kingma, 2013; Rezende et al., 2014) is a directed model that uses learned approximate inference and can be trained purely with gradient-based methods

Generative adversarial nets optimize a kind of Turing test and are currently the basis of the best generative model of images.

http://arxiv.org/pdf/1511.06434v2.pdf UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and filtering the maximal activations, one can find the approximate purpose of each convolution filter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters (Mordvintsev et al.).

Architecture guidelines for stable Deep Convolutional GANs • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). • Use batchnorm in both the generator and the discriminator. • Remove fully connected hidden layers for deeper architectures. • Use ReLU activation in generator for all layers except for the output, which uses Tanh. • Use LeakyReLU activation in the discriminator for all layers.

http://arxiv.org/pdf/1511.06455.pdf

http://arxiv.org/abs/1602.04938 LIME Exploratory

http://arxiv.org/abs/1312.6114 Auto-Encoding Variational Bayes

We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator.

https://arxiv.org/abs/1502.04623 DRAW: A Recurrent Neural Network For Image Generation

https://openai.com/requests-for-research/#inverse-draw

http://arxiv.org/abs/1606.02185v1 Towards a Neural Statistician

An efficient learner is one who reuses what they already know to tackle a new problem. For a machine learner, this means understanding the similarities amongst datasets. In order to do this, one must take seriously the idea of working with datasets, rather than datapoints, as the key objects to model. Towards this goal, we demonstrate an extension of a variational autoencoder that can learn a method for computing representations, or statistics, of datasets in an unsupervised fashion. The network is trained to produce statistics that encapsulate a generative model for each dataset. Hence the network enables efficient learning from new datasets for both unsupervised and supervised tasks. We show that we are able to learn statistics that can be used for: clustering datasets, transferring generative models to new datasets, selecting representative samples of datasets and classifying previously unseen classes.

http://arxiv.org/pdf/1511.01844v2.pdf A NOTE ON THE EVALUATION OF GENERATIVE MODELS

https://arxiv.org/abs/1606.00709 f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach.

Statistical divergences such as the well-known Kullback-Leibler divergence measure the difference between two given probability distributions. A large class of different divergences are the so called f-divergences

https://arxiv.org/abs/1606.03657 InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation.

https://openai.com/blog/generative-models/

http://arxiv.org/pdf/1606.05908v1.pdf Tutorial on Variational Autoencoders

http://arxiv.org/abs/1511.05121 Deep Kalman Filters

http://arxiv.org/pdf/1601.06759v2.pdf Pixel Recurrent Neural Networks

Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two dimensional recurrent layers and an effective use of residual connections in deep recurrent networks.

http://arxiv.org/pdf/1511.01844v2.pdf A NOTE ON THE EVALUATION OF GENERATIVE MODELS

An evaluation based on samples is biased towards models which overfit and therefore a poor indicator of a good density model in a log-likelihood sense, which favors models with large entropy. Conversely, a high likelihood does not guarantee visually pleasing samples.

We therefore argue Parzen window estimates should be avoided for evaluating generative models, unless the application specifically requires such a loss function. In this case, we have shown that a k-means based model can perform better than the true density. To summarize, our results demonstrate that for generative models there is no one-fits-all loss function but a proper assessment of model performance is only possible in the the context of an application.

http://arxiv.org/abs/1509.00519 Importance Weighted Autoencoders

We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks.

http://arxiv.org/abs/1607.08022v1 Instance Normalization: The Missing Ingredient for Fast Stylization

In this short note, we demonstrate that by replacing batch normalization with instance normalization it is possible to dramatically improve the performance of certain deep neural networks for image generation.

https://arxiv.org/abs/1609.03499v2 WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.

http://distill.pub/2016/deconv-checkerboard/

https://arxiv.org/pdf/1610.09585v1.pdf CONDITIONAL IMAGE SYNTHESIS WITH AUXILIARY CLASSIFIER GANS

http://openreview.net/pdf?id=S1JG13oee B-GAN: UNIFIED FRAMEWORK OF GENERATIVE ADVERSARIAL NETWORKS

We propose a novel algorithm that repeats density ratio estimation and f-divergence minimization. Our algorithm offers a new unified perspective toward understanding GANs and is able to make use of multiple viewpoints obtained from the density ratio estimation research, e.g. what divergence is stable and relative density ratio is useful.

We have proposed a novel unified algorithm to learn a deep generative model from a density ratio estimation perspective. Our algorithm provides the experimental insights that Pearson divergence and estimating relative density ratio are useful to improve the stability of GAN learning. Other insights regarding density ratio estimation would also be also useful. GANs are sensitive to data sets, the form of the network and hyper-parameters.

http://matroid.com/scaledml/slides/ilya.pdf Train a GAN • such that: a small subset of its variables is accurately predictable from the generated sample • Straightforward to add this constraint

Exploration with generative models - Rein Houthooft, Xi Chen, John Schulman, Filip De Turck, Pieter Abbeel https://arxiv.org/abs/1605.09674 Variational Information Maximizing Exploration Curiosity - take actions to maximize “information gain”. Extremely well on low-D environments • Many unsolvable problems become solvable

https://arxiv.org/pdf/1611.05644v1.pdf Inverting The Generator Of A Generative Adversarial Network

This paper introduces techniques for projecting image samples into the latent space using any pre-trained GAN, provided that the computational graph is available. We evaluate these techniques on both MNIST digits and Omniglot handwritten characters. In the case of MNIST digits, we show that projections into the latent space maintain information about the style and the identity of the digit. In the case of Omniglot characters, we show that even characters from alphabets that have not been seen during training may be projected well into the latent space; this suggests that this approach may have applications in one-shot learning

https://arxiv.org/abs/1605.09304v5 Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

https://arxiv.org/pdf/1610.09585v1.pdf CONDITIONAL IMAGE SYNTHESIS WITH AUXILIARY CLASSIFIER GANS

https://github.com/Guim3/IcGAN Invertible Conditional GANs

Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J. (2016). “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks.”. arXiv:1605.09304.

https://arxiv.org/abs/1606.03498 Improved Techniques for Training GANs

We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, **our primary goal is not to train a model that assigns high likelihood to test data**, nor do we require the model to be able to learn well without using any labels.

https://github.com/codeaudit/ganhacks

https://blog.ought.com/nips-2016-875bb8fadb8c#.tna5eeblv

Why are GAN image samples so sharp, whereas variational autoencoder samples aren’t? One hypothesis is that it has something to do with the fact that the loss function for VAEs is the likelihood. But we can make GANs maximize likelihood as well, and GAN samples are still sharp, so this seems less plausible now. The reason probably has more to do with the particular approximation strategy used (e.g., the fact that VAEs optimize a lower bound), or some other component of the model architecture.

Three big open problems for GANs: (1) How do you address the fact that the minimax game between the generator and discriminator may never approach an equilibrium? In other words, how do you build a system using GANs so that you know that it will converge to a good solution? (2) Even if they do converge, current systems still have issues with global structure: they cannot count (e.g. the number of eyes on a dog) and frequently get long-range connections wrong (e.g. they show multiple perspectives as part of the same image). (3) How can we use GANs in discrete settings, such as for generating text?

https://arxiv.org/abs/1611.04273 On the Quantitative Analysis of Decoder-Based Generative Models

We propose to use Annealed Importance Sampling for evaluating log-likelihoods for decoder-based models and validate its accuracy using bidirectional Monte Carlo. Using this technique, we analyze the performance of decoder-based models, the effectiveness of existing log-likelihood estimators, the degree of overfitting, and the degree to which these models miss important modes of the data distribution.

https://arxiv.org/pdf/1701.07875v1.pdf Wasserstein GAN

https://arxiv.org/abs/1610.01945 Connecting Generative Adversarial Networks and Actor-Critic Methods

Here we show that GANs can be viewed as actor-critic methods in an environment where the actor cannot affect the reward. We review the strategies for stabilizing training for each class of models, both those that generalize between the two and those that are particular to that model. We also review a number of extensions to GANs and RL algorithms with even more complicated information flow. We hope that by highlighting this formal connection we will encourage both GAN and RL communities to develop general, scalable, and stable algorithms for multilevel optimization with deep networks, and to draw inspiration across communities.

https://arxiv.org/pdf/1703.02156v1.pdf On the Limits of Learning Representations with Label-Based Supervision

Will the representations learned from these generative methods ever rival the quality of those from their supervised competitors? In this work, we argue in the affirmative, that from an information theoretic perspective, generative models have greater potential for representation learning. Based on several experimentally validated assumptions, we show that supervised learning is upper bounded in its capacity for representation learning in ways that certain generative models, such as Generative Adversarial Networks (GANs) are not.

https://arxiv.org/abs/1605.09782 Adversarial Feature Learning

However, in their existing form, GANs have no means of learning the inverse mapping – projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.

http://guimperarnau.com/blog/2017/03/Fantastic-GANs-and-where-to-find-them

https://willwhitney.github.io/gan-article/ An intuitive introduction to Generative Adversarial Nets

https://arxiv.org/abs/1704.00028 Improved Training of Wasserstein GANs

http://www.offconvex.org/2017/03/30/GANs2/

https://arxiv.org/pdf/1705.02894.pdf Geometric GAN

Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN.

https://arxiv.org/abs/1706.00550v1 On Unifying Deep Generative Models

This paper establishes formal connections between deep generative modeling approaches through a new formulation of GANs and VAEs. We show that GANs and VAEs are essentially minimizing KL divergences with opposite directions and reversed latent/visible treatments, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to exchange ideas across research lines in a principled way. For example, we transfer the importance weighting method in VAE literatures for improved GAN learning, and enhance VAEs with an adversarial mechanism. Quantitative experiments show generality and effectiveness of the imported extensions.

GANs simultaneously learn a metric (defined by the discriminator) to guide the generator learning, which resembles the iterative teacher-student distillation framework where a teacher network is simultaneously learned from structured knowledge (e.g., logic rules) and provides knowledge-informed learning signals for student networks of interest. It is exciting to build formal connections between these approaches and enable incorporation of structured knowledge in deep generative modeling.

A second example is the classic wake-sleep algorithm, where the wake phase reconstructs visibles conditioned on latents, while the sleep phase reconstructs latents conditioned on visibles (i.e., generated samples). Hence, visibles and latents are treated in a completely symmetric manner.

https://arxiv.org/abs/1706.09549v2 Distributional Adversarial Networks

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination. Inspired by discrepancy measures and two-sample tests between probability distributions, we propose two such distributional adversaries that operate and predict on samples, and show how they can be easily implemented on top of existing models. Various experimental results show that generators trained with our distributional adversaries are much more stable and are remarkably less prone to mode collapse than traditional models trained with pointwise prediction discriminators. The application of our framework to domain adaptation also results in considerable improvement over recent state-of-the-art.

https://arxiv.org/abs/1705.07642 From optimal transport to generative modeling: the VEGAN cookbook

Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.

https://arxiv.org/abs/1707.05776 Optimizing the Latent Space of Generative Networks

The goal of this paper is to disentangle the contribution of the optimization procedure and the network parametrization to the success of GANs. To this end we introduce and study Generative Latent Optimization (GLO), a framework to train a generator without the need to learn a discriminator, thus avoiding challenging adversarial optimization problems. We show experimentally that GLO enjoys many of the desirable properties of GANs: learning from large data, synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors.

https://arxiv.org/pdf/1707.09241.pdf Generator Reversal

Whereas the dominant paradigm combines simple priors over codes with complex deterministic models, we propose instead to use more flexible code distributions. These distributions are estimated non-parametrically by reversing the generator map during training. The benefits include: more powerful generative models, better modeling of latent structure and explicit control of the degree of generalization.

https://arxiv.org/abs/1708.08819 Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields

Generative adversarial networks (GANs) evolved into one of the most successful unsupervised techniques for generating realistic images. Even though it has recently been shown that GAN training converges, GAN models often end up in local Nash equilibria that are associated with mode collapse or otherwise fail to model the target distribution. We introduce Coulomb GANs, which pose the GAN learning problem as a potential field, where generated samples are attracted to training set samples but repel each other. The discriminator learns a potential field while the generator decreases the energy by moving its samples along the vector (force) field determined by the gradient of the potential field. Through decreasing the energy, the GAN model learns to generate samples according to the whole target distribution and does not only cover some of its modes. We prove that Coulomb GANs possess only one Nash equilibrium which is optimal in the sense that the model distribution equals the target distribution. We show the efficacy of Coulomb GANs on a variety of image datasets. On LSUN and celebA, Coulomb GANs set a new state of the art and produce a previously unseen variety of different samples.

http://www.shakirm.com/slides/DeepGenModelsTutorial.pdf

https://arxiv.org/abs/1705.09783 Good Semi-supervised Learning that Requires a Bad GAN

https://arxiv.org/pdf/1710.07035.pdf Generative Adversarial Networks: An Overview

https://openreview.net/forum?id=ByQpn1ZA- Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step