Operational Calculus on Programming Spaces

In this paper we develop operational calculus on programming spaces that generalizes existing approaches to automatic differentiation of computer programs and provides a rigorous framework for program analysis through calculus. We present an abstract computing machine that models automatically differentiable computer programs. Computer programs are viewed as maps on a finite dimensional vector space called virtual memory space, which we extend by the tensor algebra of its dual to accommodate derivatives. The extended virtual memory is by itself an algebra of programs, a data structure one can calculate with, and its elements give the expansion of the original program as an infinite tensor series at program's input values. We define the operator of differentiation on programming spaces and implement higher order derivatives as well as a generalized shift operator in terms of its powers. Our approach offers a powerful tool for program analysis and approximation as well as a unified approach to automatic differentiation covering both forward and reverse mode of arbitrary order under a single operator. Several possible applications to computer science are presented, most notably trainable general tensor neural networks that can provide a meaningful way of neural network initialization and enable generalization of the existing state of the art methods for analyzing neural networks to any computer program, and vice versa. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions

We fuse shifts and point-wise convolutions to construct end-to-end trainable shift-based modules, with a hyperparameter characterizing the tradeoff between accuracy and efficiency. To demonstrate the operation's efficacy, we replace ResNet's 3×3 convolutions with shift-based modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters; we additionally demonstrate the operation's resilience to parameter reduction on ImageNet, outperforming ResNet family members. We finally show the shift operation's applicability across domains, achieving strong performance with fewer parameters on classification, face verification and style transfer.