This is an old revision of the document!

Hardware Acceleration Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators DaDianNao: A Machine-Learning Supercomputer A Reconfigurable Low Power High Throughput Architecture for Deep Network Training Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing BIT-PRAGMATIC DEEP NEURAL NETWORK COMPUTING

PRA improves performance by 3.1x over the DaDiaNao (DaDN) accelerator Chen et al. (2014) and by 3.5x when DaDN uses an 8-bit quantized representation Warden (2016). DaDN was reported to be 300x faster than commodity graphics processors.

To the best of our knowledge Pragmatic is the first DNN accelerator that exploits not only the per layer precision requirements of CNNs but also the essential bit information content of the activation values. While this work targeted high-performance implementations, Pragmatic’s core approach should be applicable to other hardware accelerators. We have investigated Pragmatic only for inference and with image classification convolutional neural networks. While desirable, applying the same concept to other network types, layers other than the convolutional one, is left for future work. It would also be interesting to study how the Pragmatic concepts can be applied to more general purpose accelerators or even graphics processors. RETHINKING NUMERICAL REPRESENTATIONS FOR DEEP NEURAL NETWORKS SIGMA-DELTA QUANTIZED NETWORKS

Deep neural networks can be obscenely wasteful. When processing video, a convolutional network expends a fixed amount of computation for each frame with no regard to the similarity between neighbouring frames. As a result, it ends up repeatedly doing very similar computations. To put an end to such waste, we introduce Sigma-Delta networks. With each new input, each layer in this network sends a discretized form of its change in activation to the next layer. Thus the amount of computation that the network does scales with the amount of change in the input and layer activations, rather than the size of the network. We introduce an optimization method for converting any pre-trained deep network into an optimally efficient Sigma-Delta network, and show that our algorithm, if run on the appropriate hardware, could cut at least an order of magnitude from the computational cost of processing video data. Fast and Efficient Asynchronous Neural Computation with Adapting Spiking Neural Networks Local Binary Convolutional Neural Networks

We propose local binary convolution (LBC), an efficient alternative to convolutional layers in standard convolutional neural networks (CNN). The design principles of LBC are motivated by local binary patterns (LBP). The LBC layer comprises of a set of fixed sparse pre-defined binary convolutional filters that are not updated during the training process, a non-linear activation function and a set of learnable linear weights. The linear weights combine the activated filter responses to approximate the corresponding activated filter responses of a standard convolutional layer. Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing Efficient Processing of Deep Neural Networks: A Tutorial and Survey