# Compressed Sensing

Aliases Structured Signal Recovery

Intent

Recover data given only a few sample data points.

Motivation

How can we recover the essence of the original data given only a few data samples?

Structure

<Diagram>

Discussion

The prevailing dogma of signal processing is that a signal must be sampled at a rate at least twice its highest frequency in order for it to be represented without error. The theory of Compressed Sensing (CS) shows that a representation with acceptable error can be reconstructed with significantly less data. In DL the feature space of a input representation is typical of a high dimension. CS provides motivation towards discovering a basis wherein the dimensions are much lower (i.e. Sparse). We would there like to learn a model that maps a representation into a space with as few dimensions as required.

CS is based on a robust and rich mathematical foundation that was formulated by Emmanuel Candes, Terence Tao (2006 Fields Medalist), and David Donoho. CS methods are able exceed Nyquist sampling limits by leveraging the assumption that an input signal originates from a sparse basis. The theory is that there are three requirements for CS:

1. Sparsity - which requires the original data is sparse in some domain.
2. Random Sampling - where data is randomly sampled in the sparse domain.
3. Reconstruction - where data is reconstructed through L0 regularization.

The method works by applying an L1 regularization to iteratively discover a sparse solution that is able to reconstruct the original data. CS is able to recover the original data from a few random measurements.

CS is a signal recovery method, the algorithm performs a expensive optimization that involves discovering a sparse basis to reconstruct a original image from given a small number of samples. CS is thus performing a kind of generalization wherein with just a few samples (ex. pixels) it is able to regenerate an entire dataset (ex. image). Can Deep Learning systems perform a similar kind of signal recovery? It turns out that Deep Learning networks however can be shown to competitive in this scenario. This has been studied in [DEEPCS2016] where the researchers have employed a Stacked Denoising Autoencoder (SDA) and have compared it favorable against other classical CS based systems.

The SDA approach was faster in reconstruction than the CS methods. This is because CS methods require a solution to an optimization problem. This is done by using either convex optimization algorithms such as linear programming or using greedy algorithms that are iterative in nature. A DL based SDA however does not iteratively solve an optimization problem. Rather it just requires a single feed-forward pass through the inference path to recover the original images from their measurements. It is as if the SDA approach is able to “learn to deconstruct”.

How is it that a neural network is able to be competitive to an iterative optimization method? This would require that the CS optimization method is computationally reducible, or in other words there exists a short cut approximate solution to the iterative CS optimization algorithm. The SDA approach is an application of the Generative Model pattern, that is it generates an image from limited data.

Generative Models are trained against the class of images that they wish to regenerate. CS however is a generic reconstruction method for any image. Does this then imply that a neural network able to approximate a generic CS optimization algorithm for a wider set of images? The question of using neural networks as approximations to optimization methods is also discussed in the Meta-Learning pattern.

Although CS was developed in a field different from neural networks, CS has a rich mathematical framework that can be leveraged to understand how we can achieve greater generalization in neural networks. Generalization in neural networks can be achieved by learning Disentangled Models. CS is able to generalize (i.e. reconstruct) an image just from a few randomly sampled points Therefore, a DNN learning a CS optimization method may be employed as an effective Generative Model. This is fertile ground for more research.

SINDy is an example of a method that is derived from CS that is able to discover the non-linear equations of a system by sampling data [SINDY]. SINDy leverages the observation that most physical systems have a few nonlinear terms that determine the dynamics. The advent of CS with its sparsity assumptions provides a scalable algorithm to discover models. The fitted nonlinear model attempts to balance model with accuracy. SINDy reveals an approach of model fitting where we can leverage ideas from CS to discover sparse models. This approach can be extended further by applying other kinds of regularization beyond L1 such as disentanglement. The effectiveness of SINDy implies that DNN should also be able to effective approximate the dynamics of non-linear systems.

The implication of these “learning to deconstruct” DNN is that they should be able to perform at a high level of classification accuracy given only a few sampled data points. Taking this even further, if a system is able to generate a larger data set given just a data samples, then could it be possible to train a neural network from just a few data samples? Neural networks are limited by the need to be trained with many data samples. Ideally we would like to have a training process where only a few data samples are needed and a CS based generalization capability is used to generate the remaining dataset. This is the question behind research on One-Shot Learning.

Known Uses

Related Patterns

<Diagram>

Relationship to Canonical Patterns

References

[SINDY] http://www.pnas.org/content/113/15/3932.full.pdf Discovering governing equations from data by sparse identification of nonlinear dynamical systems

https://arxiv.org/pdf/1606.05579.pdf Early Visual Concept Learning with Unsupervised Deep Learning

deep unsupervised generative models are capable of learning disentangled representations of the visual data generative factors if put under similar learning constraints as those present in the ventral visual pathway in the brain: 1) the observed data is generated by underlying factors that are densely sampled from their respective continuous distributions; and 2) the model is encouraged to perform redundancy reduction and to pay attention to statistical independencies in the observed data. The application of such pressures to an unsupervised generative model leads to the familiar VAE formulation [19, 26] with a temperature coefficient β that regulates the strength of such pressures and, as a consequence, the qualitative nature of the representations learnt by the model.

We have shown that learning disentangled representations leads to useful emergent properties. The ability of trained VAEs to reason about new unseen objects suggests that they have learnt from raw pixels and in a completely unsupervised manner basic visual concepts, such as the “objectness” property of the world.

In order to learn disentangled representations of the generative factors we introduce a constraint that encourages the distribution over latent factors z to be close to a prior that embodies the neuroscience inspired pressures of redundancy reduction and independence prior. This results in a constrained optimisation problem shown in Eq. 2, where  specifies the strength of the applied constraint.

max φ,θ Eqφ(z|x) [log pθ(x|z)] subject to DKL(qφ(z|x)||p(z)) < . (2)

Writing Eq. 2 as a Lagrangian we obtain the familiar variational free energy objective function shown in Eq. 3 [19, 26], where β > 0 is the inverse temperature or regularisation coefficient.

L(θ, φ; x) = Eqφ(z|x) [log pθ(x|z)] − β DKL(qφ(z|x)||p(z)) (3)

Varying β changes the degree of applied learning pressure during training, thus encouraging different learnt representations. When β = 0, we obtain the standard maximum likelihood learning. When β = 1, we recover the Bayes solution. We postulate that in order to learn disentangled representations of the continuous data generative factors it is important to tune β to approximate the level of learning pressure present in the ventral visual stream.

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm.

To calculate the model parameters, we minimize a cross entropy cost function. To reconstruct the sparse vectors at the decoder, we propose a greedy solver that uses the above model to estimate the conditional probabilities. By performing extensive experiments on two real world datasets, we show that the proposed method significantly outperforms the general MMV solver (the Simultaneous Orthogonal Matching Pursuit (SOMP)) and the modelbased Bayesian methods including Multitask Compressive Sensing  and Sparse Bayesian Learning for Temporally Correlated Sources .

[DEEPCS2016] http://arxiv.org/abs/1508.04065 A Deep Learning Approach to Structured Signal Recovery

In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from training data, and that efficiently computes a signal estimate.

We showed how a stacked denoising autoencoder (SDA), enables us to capture statistical dependencies between the different elements of certain signals and improve signal recovery performance as compared to the CS approach.

http://arxiv.org/pdf/1601.06892v2.pdf ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements

We propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate reconstruction is fed into an off-the-shelf denoiser to obtain the final reconstructed image.

http://arxiv.org/abs/1109.4424v4 Statistical physics-based reconstruction in compressed sensing

We design a new procedure which is able to reconstruct exactly the signal with a number of measurements that approaches the theoretical limit in the limit of large systems. It is based on the joint use of three essential ingredients: a probabilistic approach to signal reconstruction, a message-passing algorithm adapted from belief propagation, and a careful design of the measurement matrix inspired from the theory of crystal nucleation.

http://arxiv.org/pdf/1606.01519v1.pdf A Deep Learning Approach to Block-based Compressed Sensing of Images

http://arxiv.org/abs/1607.05966 Onsager-corrected deep learning for sparse linear inverse problems

We consider the application of deep learning to the sparse linear inverse problem encountered in compressive sensing, where one seeks to recover a sparse signal from a small number of noisy linear measurements. In this paper, we propose a novel neural-network architecture that decouples prediction errors across layers in the same way that the approximate message passing (AMP) algorithm decouples them across iterations: through Onsager correction.

https://arxiv.org/pdf/0907.3574v1.pdf Message Passing Algorithms for Compressed Sensing

We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.

https://arxiv.org/abs/1609.00285v3 Understanding Neural Sparse Coding with Matrix Factorization

An acceleration using neural networks was proposed, coined LISTA, which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this paper we study the reasons for such acceleration.

http://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf Learning Fast Approximations of Sparse Coding

http://openreview.net/pdf?id=H1VTBnixg LAYERED CONVOLUTIONAL DICTIONARY LEARNING FOR SPARSE CODING ITEMSETS

In this paper, the idea of sparse representation learning for itemsets to achieve the best data compression is introduced and validated. We formally state our targeted problem as an optimization problem and prove it to be NP-hard. Our proposed matching pursuit algorithm greedily solves the max set cover problem to infer basis which minimize overall compression loss. A convolutional dictionary learning approach hierarchically finds statistically dependent patterns. Our extensive empirical validation shows that mined patterns are more informative and dependent. CSI also improves the efficacy of deep neural networks. Incorporating sequential patterns for sparse representation learning remains a rich area for further research.

https://arxiv.org/pdf/1703.03208v1.pdf Compressed Sensing using Generative Models

The goal of compressed sensing is to estimate a vector from an underdetermined system of noisy linear measurements, by making use of prior knowledge on the structure of vectors in the relevant domain. For almost all results in this literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we suppose that vectors lie near the range of a generative model G:Rk→Rn. Our main theorem is that, if G is L-Lipschitz, then roughly O(klogL) random Gaussian measurements suffice for an ℓ2/ℓ2 recovery guarantee. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use 5-10x fewer measurements than Lasso for the same accuracy.

https://arxiv.org/pdf/1705.08664.pdf Towards Understanding the Invertibility of Convolutional Neural Networks

We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable reconstruction results on real images.