Name Weight Quantization (aka Binarization)


Reduce memory requirements by using weights of lower precision.





Known Uses

Related Patterns


References A Survey on Learning to Hash

the quantization approach can be derived from the distance-distance difference minimization criterion. Deep neural networks are robust to weight binarization and other non-linear distortions Layer Normalization

One way to reduce the training time is to normalize the activities of the neurons. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.

nvariance properties under the normalization methods. Recurrent Neural Networks With Limited Numerical Precision

This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations

figure illustrates the procedure explained in section 3.2 for approximating a convolution using binary operations. Ternary Neural Networks for Resource-Efficient AI Applications

We train these TNNs using a teacher-student approach. Using only ternary weights and ternary neurons, with a step activation function of two-thresholds, the student ternary network learns to mimic the behaviour of its teacher network. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation TRAINED TERNARY QUANTIZATION

In this work, we extended a large deviation analysis of the solution space of a single layer neural network from the purely binary and balanced case [14] to the general discrete case. Binarized Neural Networks on the ImageNet Classification Task

We trained Binarized Neural Networks (BNNs) on the high resolution ImageNet ILSVRC-2102 dataset classification task and achieved a good performance. With a moderate size network of 13 layers, we obtained top-5 classification accuracy rate of 84.1 % on validation set through network distillation, much better than previous published results of 73.2% on XNOR network and 69.1% on binarized GoogleNET. Scaling Binarized Neural Networks on Reconfigurable Logic

The Finn framework was recently proposed for building fast and flexible field programmable gate array (FPGA) accelerators for BNNs. Finn utilized a novel set of optimizations that enable efficient mapping of BNNs to hardware and implemented fully connected, non-padded convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework BitNet: Bit-Regularized Deep Neural Networks Verifying Properties of Binarized Deep Neural Networks MIXED PRECISION TRAINING ResBinNet: Residual Binary Neural Network Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks Differentiable Learning of Submodular Models

In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees. We provide an easily computable approximation to the Jacobian complemented with a complete theoretical analysis. Finally, these contributions let us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation. THE HIGH-DIMENSIONAL GEOMETRY OF BINARY NEURAL NETWORKS TRAINING AND INFERENCE WITH INTEGERS IN DEEP NEURAL NETWORKS Alternating Multi-bit Quantization for Recurrent Neural Networks

In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+1}. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference High-Dimensional Geometry of Binary Neural Networks

Neural networks with binary weights and activations have similar performance to their continuous counterparts with substantially reduced execution time and power usage. We provide an experimentally verified theory for understanding how one can get away with such a massive reduction in precision based on the geometry of HD vectors. First, we show that binarization of high-dimensional vectors preserves their direction in the sense that the angle between a random vector and its binarized version is much smaller than the angle between two random vectors (Angle Preservation Property). Second, we take the perspective of the network and show that binarization approximately preserves weight-activation dot products (Dot Product Proportionality Property). More generally, when using a network compression technique, we recommend looking at the weight activation dot product histograms as a heuristic to help localize the layers that are most responsible for performance degradation. Third, we discuss the impacts of the low effective dimensionality of the data on the first layer of the network. We recommend either using continuous weights for the first layer or a Generalized Binarization Transformation. Such a transformation may be useful for architectures like LSTMs where the update for the hidden state declares a particular set of axes to be important (e.g. by taking the pointwise multiply of the forget gates with the cell state). Finally, we show that neural networks with ternary weights and activations can also be understood with our approach. More broadly speaking, our theory is useful for analyzing a variety of neural network compression techniques that transform the weights, activations or both to reduce the execution cost without degrading performance. Network Sketching: Exploiting Binary Structure in Deep CNNs Quantizing deep convolutional networks for efficient inference: A whitepaper Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference