Name Weight Quantization (aka Binarization)
Intent
Reduce memory requirements by using weights of lower precision.
Motivation
Structure
<Diagram>
Discussion
Known Uses
Related Patterns
<Diagram>
References
http://arxiv.org/pdf/1606.00185v1.pdf A Survey on Learning to Hash
the quantization approach can be derived from the distance-distance difference minimization criterion.
https://arxiv.org/abs/1606.01981 Deep neural networks are robust to weight binarization and other non-linear distortions
https://arxiv.org/pdf/1607.06450v1.pdf Layer Normalization
One way to reduce the training time is to normalize the activities of the neurons. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.
nvariance properties under the normalization methods.
http://arxiv.org/abs/1608.06902v1 Recurrent Neural Networks With Limited Numerical Precision
This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware.
http://arxiv.org/abs/1602.02830v3 Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
http://arxiv.org/abs/1603.05279 XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations
figure illustrates the procedure explained in section 3.2 for approximating a convolution using binary operations.
https://arxiv.org/abs/1609.00222 Ternary Neural Networks for Resource-Efficient AI Applications
We train these TNNs using a teacher-student approach. Using only ternary weights and ternary neurons, with a step activation function of two-thresholds, the student ternary network learns to mimic the behaviour of its teacher network.
https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
https://arxiv.org/abs/1609.08144v1 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
http://openreview.net/pdf?id=S1_pAu9xl TRAINED TERNARY QUANTIZATION
https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
In this work, we extended a large deviation analysis of the solution space of a single layer neural network from the purely binary and balanced case [14] to the general discrete case.
https://arxiv.org/abs/1604.03058v5 Binarized Neural Networks on the ImageNet Classification Task
We trained Binarized Neural Networks (BNNs) on the high resolution ImageNet ILSVRC-2102 dataset classification task and achieved a good performance. With a moderate size network of 13 layers, we obtained top-5 classification accuracy rate of 84.1 % on validation set through network distillation, much better than previous published results of 73.2% on XNOR network and 69.1% on binarized GoogleNET.
https://arxiv.org/abs/1701.03400v2 Scaling Binarized Neural Networks on Reconfigurable Logic
The Finn framework was recently proposed for building fast and flexible field programmable gate array (FPGA) accelerators for BNNs. Finn utilized a novel set of optimizations that enable efficient mapping of BNNs to hardware and implemented fully connected, non-padded convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements.
https://arxiv.org/abs/1702.03044v1 Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training.
https://arxiv.org/pdf/1705.09283v2.pdf Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework
https://github.com/codeaudit/Gated-XNOR
https://arxiv.org/abs/1708.04788 BitNet: Bit-Regularized Deep Neural Networks
https://arxiv.org/abs/1709.06662v1 Verifying Properties of Binarized Deep Neural Networks
https://arxiv.org/pdf/1710.03740.pdf MIXED PRECISION TRAINING
https://arxiv.org/abs/1711.01243v1 ResBinNet: Residual Binary Neural Network
https://arxiv.org/abs/1711.02213 Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
https://las.inf.ethz.ch/files/djolonga17learning.pdf Differentiable Learning of Submodular Models
In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees. We provide an easily computable approximation to the Jacobian complemented with a complete theoretical analysis. Finally, these contributions let us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation.
https://openreview.net/pdf?id=B1IDRdeCW THE HIGH-DIMENSIONAL GEOMETRY OF BINARY NEURAL NETWORKS
https://openreview.net/pdf?id=HJGXzmspb TRAINING AND INFERENCE WITH INTEGERS IN DEEP NEURAL NETWORKS https://github.com/boluoweifenda/WAGE
https://openreview.net/forum?id=S19dR9x0b Alternating Multi-bit Quantization for Recurrent Neural Networks
In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+1}. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied.
https://arxiv.org/pdf/1712.05877.pdf Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
https://openreview.net/forum?id=B1IDRdeCWThe High-Dimensional Geometry of Binary Neural Networks
Neural networks with binary weights and activations have similar performance to their continuous counterparts with substantially reduced execution time and power usage. We provide an experimentally verified theory for understanding how one can get away with such a massive reduction in precision based on the geometry of HD vectors. First, we show that binarization of high-dimensional vectors preserves their direction in the sense that the angle between a random vector and its binarized version is much smaller than the angle between two random vectors (Angle Preservation Property). Second, we take the perspective of the network and show that binarization approximately preserves weight-activation dot products (Dot Product Proportionality Property). More generally, when using a network compression technique, we recommend looking at the weight activation dot product histograms as a heuristic to help localize the layers that are most responsible for performance degradation. Third, we discuss the impacts of the low effective dimensionality of the data on the first layer of the network. We recommend either using continuous weights for the first layer or a Generalized Binarization Transformation. Such a transformation may be useful for architectures like LSTMs where the update for the hidden state declares a particular set of axes to be important (e.g. by taking the pointwise multiply of the forget gates with the cell state). Finally, we show that neural networks with ternary weights and activations can also be understood with our approach. More broadly speaking, our theory is useful for analyzing a variety of neural network compression techniques that transform the weights, activations or both to reduce the execution cost without degrading performance.
https://arxiv.org/abs/1706.02021 Network Sketching: Exploiting Binary Structure in Deep CNNs
https://arxiv.org/abs/1806.08342v1 Quantizing deep convolutional networks for efficient inference: A whitepaper
https://arxiv.org/abs/1808.08784v1 Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant https://github.com/cc-hpc-itwm/TensorQuant
https://arxiv.org/abs/1810.04714 Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation https://github.com/salu133445/binarygan
https://arxiv.org/abs/1809.09244 No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference