Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
residual [2017/03/21 16:07]
127.0.0.1 external edit
residual [2018/04/23 11:23]
admin
Line 141: Line 141:
  
 In this paper we study the expressive efficiency brought forth by the architectural feature of connectivity,​ motivated by the observation that nearly all state of the art networks these days employ elaborate connection schemes, running layers in parallel while splitting and merging them in various ways. A formal treatment of this question would shed light on the effectiveness of modern connectivity schemes, and in addition, could provide new tools for network design. We focus on dilated convolutional networks, a family of deep models gaining increased attention, underlying state of the art architectures like Google'​s WaveNet and ByteNet. By introducing and studying the concept of mixed tensor decompositions,​ we prove that interconnecting dilated convolutional networks can lead to expressive efficiency. In particular, we show that a single connection between intermediate layers can already lead to an almost quadratic gap, which in large-scale settings typically makes the difference between a model that is practical and one that is not. In this paper we study the expressive efficiency brought forth by the architectural feature of connectivity,​ motivated by the observation that nearly all state of the art networks these days employ elaborate connection schemes, running layers in parallel while splitting and merging them in various ways. A formal treatment of this question would shed light on the effectiveness of modern connectivity schemes, and in addition, could provide new tools for network design. We focus on dilated convolutional networks, a family of deep models gaining increased attention, underlying state of the art architectures like Google'​s WaveNet and ByteNet. By introducing and studying the concept of mixed tensor decompositions,​ we prove that interconnecting dilated convolutional networks can lead to expressive efficiency. In particular, we show that a single connection between intermediate layers can already lead to an almost quadratic gap, which in large-scale settings typically makes the difference between a model that is practical and one that is not.
 +
 +https://​arxiv.org/​pdf/​1710.10348v1.pdf MULTI-LEVEL RESIDUAL NETWORKS FROM DYNAMICAL
 +SYSTEMS VIEW
 +
 +https://​arxiv.org/​pdf/​1709.01507.pdf Squeeze-and-Excitation Networks
 +
 +In this work, we
 +focus on channels and propose a novel architectural unit,
 +which we term the “Squeeze-and-Excitation”(SE) block,
 +that adaptively recalibrates channel-wise feature responses
 +by explicitly modelling interdependencies between channels.
 +
 +https://​arxiv.org/​pdf/​1711.07971.pdf Non-local Neural Networks
 +
 +In
 +this paper, we present non-local operations as a generic
 +family of building blocks for capturing long-range dependencies.
 +Inspired by the classical non-local means method
 +[4] in computer vision, our non-local operation computes
 +the response at a position as a weighted sum of the features
 +at all positions. This building block can be plugged into
 +many computer vision architectures. ​
 +
 +https://​arxiv.org/​abs/​1710.04773v2 Residual Connections Encourage Iterative Inference
 +
 +Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively,​ overfitting,​ and we show that simple existing strategies can help alleviating this problem.
 +
 +Unshared Batch Normalization strategy therefore mitigates this exploding activation problem. ​
 +
 +https://​arxiv.org/​abs/​1804.07209 NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
 +
 +We believe that cross-breeding machine learning and control
 +theory will open up many new interesting avenues for
 +research, and that more robust and stable variants of commonly
 +used neural networks, both feed-forward and recurrent,
 +will be possible.