This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
weight_quantization [2017/12/21 02:39]
weight_quantization [2018/12/05 10:39] (current)
Line 132: Line 132:
 https://​openreview.net/​pdf?​id=HJGXzmspb TRAINING AND INFERENCE WITH INTEGERS IN DEEP https://​openreview.net/​pdf?​id=HJGXzmspb TRAINING AND INFERENCE WITH INTEGERS IN DEEP
-NEURAL NETWORKS+NEURAL NETWORKS ​https://​github.com/​boluoweifenda/​WAGE
 https://​openreview.net/​forum?​id=S19dR9x0b Alternating Multi-bit Quantization for Recurrent Neural Networks https://​openreview.net/​forum?​id=S19dR9x0b Alternating Multi-bit Quantization for Recurrent Neural Networks
Line 140: Line 140:
 https://​arxiv.org/​pdf/​1712.05877.pdf Quantization and Training of Neural Networks for Efficient https://​arxiv.org/​pdf/​1712.05877.pdf Quantization and Training of Neural Networks for Efficient
 Integer-Arithmetic-Only Inference Integer-Arithmetic-Only Inference
 +https://​openreview.net/​forum?​id=B1IDRdeCWThe High-Dimensional Geometry of Binary Neural Networks ​
 +Neural networks with binary weights and activations have similar performance to their continuous
 +counterparts with substantially reduced execution time and power usage. We provide an experimentally
 +verified theory for understanding how one can get away with such a massive reduction in
 +precision based on the geometry of HD vectors. First, we show that binarization of high-dimensional
 +vectors preserves their direction in the sense that the angle between a random vector and its binarized
 +version is much smaller than the angle between two random vectors (Angle Preservation
 +Property). Second, we take the perspective of the network and show that binarization approximately
 +preserves weight-activation dot products (Dot Product Proportionality Property). More generally,
 +when using a network compression technique, we recommend looking at the weight activation dot
 +product histograms as a heuristic to help localize the layers that are most responsible for performance
 +degradation. Third, we discuss the impacts of the low effective dimensionality of the data on the
 +first layer of the network. We recommend either using continuous weights for the first layer or a
 +Generalized Binarization Transformation. Such a transformation may be useful for architectures
 +like LSTMs where the update for the hidden state declares a particular set of axes to be important
 +(e.g. by taking the pointwise multiply of the forget gates with the cell state). Finally, we show that
 +neural networks with ternary weights and activations can also be understood with our approach. More
 +broadly speaking, our theory is useful for analyzing a variety of neural network compression techniques
 +that transform the weights, activations or both to reduce the execution cost without degrading
 +https://​arxiv.org/​abs/​1706.02021 Network Sketching: Exploiting Binary Structure in Deep CNNs
 +https://​arxiv.org/​abs/​1806.08342v1 Quantizing deep convolutional networks for efficient inference: A whitepaper
 +https://​arxiv.org/​abs/​1808.08784v1 Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant https://​github.com/​cc-hpc-itwm/​TensorQuant
 +https://​arxiv.org/​abs/​1810.04714 Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation https://​github.com/​salu133445/​binarygan
 +https://​arxiv.org/​abs/​1809.09244 No Multiplication?​ No Floating Point? No Problem! Training Networks for Efficient Inference