Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
bottleneck_layer [2016/12/02 16:15]
127.0.0.1 external edit
bottleneck_layer [2018/04/23 18:46] (current)
admin
Line 56: Line 56:
  
 We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck",​ or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization,​ in terms of generalization performance and robustness to adversarial attack. We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck",​ or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization,​ in terms of generalization performance and robustness to adversarial attack.
 +
 +https://​arxiv.org/​abs/​1804.07090 Low Rank Structure of Learned Representations
 +
 +In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically,​ we show that this has implications for compression and robustness to adversarial examples.
 + The modification “adds” virtual low-rank layers to the model that ensure that the learned
 +representations roughly lie in a low-rank space. The modified objective function is optimized using an
 +alternate minimization approach, reminiscent of that used in iterative hard thresholding (Blumensath and
 +Davies, 2009) or singular value projection (Jain et al., 2010). Using a na¨ıve singular value thresholding
 +approach would render the training intractable for all practical purposes; we use a column sampling based
 +Nystr¨om method (Williams and Seeger, 2001; Halko et al., 2011) to achieve significant speed-up, though at
 +the cost of not getting the optimal low rank projections. One can view this modified training process as a way
 +to constrain the neural network, though in a way that is very different to the widely used sparsity inducing
 +methods(eg. Anwar et al. (2017); Wen et al. (2016)) or structurally constrained methods(eg. Moczulski et al.
 +(2015); Liu et al. (2015))that seek to tackle the problem of over-parametrization