This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
weight_quantization [2017/12/21 02:39]
weight_quantization [2018/10/27 11:24]
Line 132: Line 132:
 https://​openreview.net/​pdf?​id=HJGXzmspb TRAINING AND INFERENCE WITH INTEGERS IN DEEP https://​openreview.net/​pdf?​id=HJGXzmspb TRAINING AND INFERENCE WITH INTEGERS IN DEEP
-NEURAL NETWORKS+NEURAL NETWORKS ​https://​github.com/​boluoweifenda/​WAGE
 https://​openreview.net/​forum?​id=S19dR9x0b Alternating Multi-bit Quantization for Recurrent Neural Networks https://​openreview.net/​forum?​id=S19dR9x0b Alternating Multi-bit Quantization for Recurrent Neural Networks
Line 140: Line 140:
 https://​arxiv.org/​pdf/​1712.05877.pdf Quantization and Training of Neural Networks for Efficient https://​arxiv.org/​pdf/​1712.05877.pdf Quantization and Training of Neural Networks for Efficient
 Integer-Arithmetic-Only Inference Integer-Arithmetic-Only Inference
 +https://​openreview.net/​forum?​id=B1IDRdeCWThe High-Dimensional Geometry of Binary Neural Networks ​
 +Neural networks with binary weights and activations have similar performance to their continuous
 +counterparts with substantially reduced execution time and power usage. We provide an experimentally
 +verified theory for understanding how one can get away with such a massive reduction in
 +precision based on the geometry of HD vectors. First, we show that binarization of high-dimensional
 +vectors preserves their direction in the sense that the angle between a random vector and its binarized
 +version is much smaller than the angle between two random vectors (Angle Preservation
 +Property). Second, we take the perspective of the network and show that binarization approximately
 +preserves weight-activation dot products (Dot Product Proportionality Property). More generally,
 +when using a network compression technique, we recommend looking at the weight activation dot
 +product histograms as a heuristic to help localize the layers that are most responsible for performance
 +degradation. Third, we discuss the impacts of the low effective dimensionality of the data on the
 +first layer of the network. We recommend either using continuous weights for the first layer or a
 +Generalized Binarization Transformation. Such a transformation may be useful for architectures
 +like LSTMs where the update for the hidden state declares a particular set of axes to be important
 +(e.g. by taking the pointwise multiply of the forget gates with the cell state). Finally, we show that
 +neural networks with ternary weights and activations can also be understood with our approach. More
 +broadly speaking, our theory is useful for analyzing a variety of neural network compression techniques
 +that transform the weights, activations or both to reduce the execution cost without degrading
 +https://​arxiv.org/​abs/​1706.02021 Network Sketching: Exploiting Binary Structure in Deep CNNs
 +https://​arxiv.org/​abs/​1806.08342v1 Quantizing deep convolutional networks for efficient inference: A whitepaper
 +https://​arxiv.org/​abs/​1808.08784v1 Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant https://​github.com/​cc-hpc-itwm/​TensorQuant
 +https://​arxiv.org/​abs/​1810.04714 Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation https://​github.com/​salu133445/​binarygan