This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
residual [2017/11/02 02:03]
residual [2018/04/23 11:26] (current)
Line 144: Line 144:
 https://​arxiv.org/​pdf/​1710.10348v1.pdf MULTI-LEVEL RESIDUAL NETWORKS FROM DYNAMICAL https://​arxiv.org/​pdf/​1710.10348v1.pdf MULTI-LEVEL RESIDUAL NETWORKS FROM DYNAMICAL
 +https://​arxiv.org/​pdf/​1709.01507.pdf Squeeze-and-Excitation Networks
 +In this work, we
 +focus on channels and propose a novel architectural unit,
 +which we term the “Squeeze-and-Excitation”(SE) block,
 +that adaptively recalibrates channel-wise feature responses
 +by explicitly modelling interdependencies between channels.
 +https://​arxiv.org/​pdf/​1711.07971.pdf Non-local Neural Networks
 +this paper, we present non-local operations as a generic
 +family of building blocks for capturing long-range dependencies.
 +Inspired by the classical non-local means method
 +[4] in computer vision, our non-local operation computes
 +the response at a position as a weighted sum of the features
 +at all positions. This building block can be plugged into
 +many computer vision architectures. ​
 +https://​arxiv.org/​abs/​1710.04773v2 Residual Connections Encourage Iterative Inference
 +Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively,​ overfitting,​ and we show that simple existing strategies can help alleviating this problem.
 +Unshared Batch Normalization strategy therefore mitigates this exploding activation problem. ​
 +https://​arxiv.org/​abs/​1804.07209 NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
 +Non-autonomy is implemented
 +by skip connections from the block input
 +to each of the unrolled processing stages and allows
 +stability to be enforced so that blocks can
 +be unrolled adaptively to a pattern-dependent processing
 +depth. We prove that the network is globally
 +asymptotically stable so that for every initial
 +condition there is exactly one input-dependent
 +equilibrium assuming tanh units, and multiple
 +stable equilibria for ReL units.
 +We believe that cross-breeding machine learning and control
 +theory will open up many new interesting avenues for
 +research, and that more robust and stable variants of commonly
 +used neural networks, both feed-forward and recurrent,
 +will be possible.