Differences

This shows you the differences between two versions of the page.

Link to this comparison view

pruning [2017/04/24 14:43]
127.0.0.1 external edit
pruning [2018/10/11 15:54] (current)
admin
Line 44: Line 44:
  
 We propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally,​ we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x. We propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally,​ we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.
 +
 +https://​arxiv.org/​abs/​1810.04622v1 Pruning neural networks: is it time to nip it in the bud?
 +
 +First, when time-constrained,​ it is better to train a simple, smaller network from scratch than prune a large network. Second, it is the architectures obtained through the pruning process --- not the learnt weights ---that prove valuable. Such architectures are powerful when trained from scratch. Furthermore,​ these architectures are easy to approximate without any further pruning: we can prune once and obtain a family of new, scalable network architectures for different memory requirements.
 +