# In Layer Transform

**Aliases** Spatial Transform Layer

**Intent**

Perform an classification invariant transformation that is driven by model parameters.

**Motivation**

**Structure**

<Diagram>

**Discussion**

This differs from data augmentation is that the network learns the kind of model transformation (i.e. projection).

**Known Uses**

**Related Patterns**

<Diagram>

**References**

https://kevinzakka.github.io/2017/01/10/stn-part1/

http://arxiv.org/pdf/1602.02660v1.pdf Exploiting Cyclic Symmetry in Convolutional Neural Networks

We introduce four operations which can be inserted into neural network models as layers, and which can be combined to make these models partially equivariant to rotations.

NAME DEFINITION BATCH SIZE # FEATURE MAPS

Slice S(x) = [x; rx; r2x; r3x]T 4 unchanged

Pool P(x) = p(x0; r1×1; r2×2; r3×3) 4 unchanged

Stack T(x) = [x0; r1×1; r2×2; r3×3] 4 4

Roll R(x) = [T(x); T(x); T(2x); T(3x)]T unchanged 4

http://torch.ch/blog/2015/09/07/spatial_transformers.html

http://arxiv.org/pdf/1506.02025.pdf Spatial Transformer Networks

http://smerity.com/articles/2016/architectures_are_the_new_feature_engineering.html

http://www.ee.cuhk.edu.hk/~xgwang/papers/ouyangWiccv13.pdf Joint Deep Learning for Pedestrian Detection

This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture.

http://arxiv.org/abs/1607.07405 gvnn: Neural Network Library for Geometric Computer Vision

We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.

https://arxiv.org/abs/1612.03897v1 Inverse Compositional Spatial Transformer Networks

In this paper, we establish a theoretical connection between the classical Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer Networks (STNs). STNs are of interest to the vision and learning communities due to their natural ability to combine alignment and classification within the same theoretical framework. Inspired by the Inverse Compositional (IC) variant of the LK algorithm, we present Inverse Compositional Spatial Transformer Networks (IC-STNs). We demonstrate that IC-STNs can achieve better performance than conventional STNs with less model capacity; in particular, we show superior performance in pure image alignment tasks as well as joint alignment/classification problems on real-world problems.

https://arxiv.org/pdf/1708.07199v1.pdf 3D Morphable Models as Spatial Transformer Networks

This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.

https://arxiv.org/pdf/1709.01889.pdf Polar Transformer Networks