Self-similarity

Also Known As

Multi-scale

Intent

Create layers within a layer to improve precision representation mapping.

Motivation

How can we improve the accuracy of a layer?

Structure

<Diagram>

Discussion

This is a rather intriguing property of a DL system that is only recently being uncovered. DL are typically characterized by width (the number of neurons in a layer) and depth (the number of layers in a machine). However it has been shown that each layer may have a recursive definition. That is, you may have layers within layers.

This was first hinted by the discovery that deep residual layers are actually much shallower networks were ensembles of layers are contributing to results that are equivalent to a single layer. Furthermore, the equivalence of deep residual networks and RNN also contributes to this idea that multiple layers distribute contribute as ensembles models. Conventionally, layers are meant to create more abstract representations a layers are stacked on top of each other. This idea of the model being diffused across layers to capture the same abstract representation is indeed new. What it does imply is that each layer can be structured recursively into several layers that contribute to the same layer.

One could in fact look at a single neuron and recognize that it performs the same function as the entire layer of neurons. The key differentiator is that the single neuron does not gain the benefit of an ensemble based activation function. A layer consisting of layers actually enables this kind of capability and its is highlighted in a recent research regarding FractalNets.

What benefit does Network in Network provide?

Known Uses

Network in Network

FractalNet

RNN

Related Patterns

<Diagram>

Relationship to Canonical Patterns

Cited by these patterns:

References

http://arxiv.org/abs/1605.07648v1 FractalNet: Ultra-Deep Neural Networks without Residuals

Repeated application of a single expansion rule generates an extremely deep network whose structural layout is precisely a truncated fractal. Such a network contains interacting subpaths of different lengths, but does not include any pass-through connections: every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers.

This property stands in stark contrast to the current approach of explicitly structuring very deep networks so that training is a residual learning problem.

A fractal design achieves an error rate of 22.85% on CIFAR-100, matching the state-of-the-art held by residual networks.

Fractal networks exhibit intriguing properties beyond their high performance. They can be regarded as a computationally efficient implicit union of subnetworks of every depth.

FractalNet demonstrates that path length is fundamental for training ultra-deep neural networks; residuals are incidental. Key is the shared characteristic of FractalNet and ResNet: large nominal network depth, but effectively shorter paths for gradient propagation during training. Fractal architectures are arguably the simplest means of satisfying this requirement, and match or exceed ResNet’s experimental performance. They are resistant to being too deep; extra depth may slow training, but does not impair accuracy.

With drop-path, regularization of extremely deep fractal networks is intuitive and effective. Drop-path doubles as a method of enforcing latency/accuracy tradeoffs within fractal networks, for applications where fast answers have utility.

Our analysis connects the emergent internal behavior of fractal networks with phenomena built into other designs. Their substructure is similar to hand-designed modules used as building blocks in some convolutional networks. Their training evolution may emulate deep supervision and student-teacher learning.

https://pseudoprofound.wordpress.com/2016/06/20/recursive-not-recurrent-neural-nets-in-tensorflow

http://arxiv.org/abs/1510.05711v2 Qualitative Projection Using Deep Neural Networks

http://arxiv.org/abs/1609.01704 Hierarchical Multiscale Recurrent Neural Networks

In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism. We show some evidence that our proposed multiscale architecture can discover underlying hierarchical structure in the sequences without using explicit boundary information. We evaluate our proposed model on character-level language modelling and handwriting sequence modelling.

https://arxiv.org/abs/1609.01704 Hierarchical Multiscale Recurrent Neural Networks

In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism. We show some evidence that our proposed multiscale architecture can discover underlying hierarchical structure in the sequences without using explicit boundary information.

https://arxiv.org/pdf/1609.09106v1.pdf HYPERNETWORKS