Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hyper-parameter_tuning [2017/12/22 00:12]
admin
hyper-parameter_tuning [2018/10/31 10:52] (current)
admin
Line 1: Line 1:
-Hyper Parameter Tuning+.Hyper Parameter Tuning
  
  
Line 237: Line 237:
  
 https://​einstein.ai/​research/​domain-specific-language-for-automated-rnn-architecture-search A domain specific language for automated rnn architecture search https://​einstein.ai/​research/​domain-specific-language-for-automated-rnn-architecture-search A domain specific language for automated rnn architecture search
 +
 +https://​arxiv.org/​abs/​1801.01563v1 DENSER: Deep Evolutionary Network Structured Representation
 +
 +The algorithm not only searches for the best network topology (e.g., number of layers, type of layers), but also tunes hyper-parameters,​ such as, learning parameters or data augmentation parameters. The automatic design is achieved using a representation with two distinct levels, where the outer level encodes the general structure of the network, i.e., the sequence of layers, and the inner level encodes the parameters associated with each layer.
 +
 +https://​arxiv.org/​pdf/​1801.01952v1.pdf Generating Neural Networks with Neural Networks
 +
 +We formulate the hypernetwork training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry transformations of the target network.
 +
 +https://​arxiv.org/​abs/​1801.05159v1 GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining
 +
 +Concretely, we (a) extract and publish GitGraph, a corpus of neural architectures and their descriptions;​ (b) we create problem-specific neural architecture search spaces, implemented as a textual search mechanism over GitGraph; (c) we propose a method of identifying unique common subgraphs within the architectures solving each problem (e.g., image processing, reinforcement learning), that can then serve as modules in the newly created problem specific neural search space.
 +
 +https://​arxiv.org/​abs/​1802.01561v1 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
 +
 +We have developed a new distributed agent IMPALA (Importance-Weighted Actor Learner Architecture) that can scale to thousands of machines and achieve a throughput rate of 250,000 frames per second. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents, use less data and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
 +
 +To the best of our knowledge, IMPALA is the first Deep-RL
 +agent that has been successfully tested in such large-scale
 +multi-task settings and it has shown superior performance
 +compared to A3C based agents (49.4% vs. 23.8% human
 +normalised score). Most importantly,​ our experiments on
 +DMLab-30 show that, in the multi-task setting, positive
 +transfer between individual tasks lead IMPALA to achieve
 +better performance compared to the expert training setting.
 +We believe that IMPALA provides a very simple yet scalable
 +and robust framework for building better Deep-RL agents
 +and has the potential to enable research on new challenges.
 +
 +Unlike the popular
 +A3C-based agents, in which workers communicate gradients
 +with respect to the parameters of the policy to a central
 +parameter server, IMPALA actors communicate trajectories
 +of experience (sequences of states, actions, and rewards)
 +to a centralised learner.
 +
 +https://​arxiv.org/​abs/​1802.03268 Efficient Neural Architecture Search via Parameters Sharing
 +
 +We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.
 +
 +The main contribution of this work is to improve the efficiency
 +of NAS by forcing all child models to share
 +weights. The idea has apparent complications,​ as different
 +child models might utilize their weights differently,​
 +but was encouraged by previous work on transfer learning
 +and multitask learning, which have established that parameters
 +learned for a particular model on a particular task
 +can be used for other models on other tasks, with little to
 +no modifications.
 +
 +https://​arxiv.org/​abs/​1802.05351 Stealing Hyperparameters in Machine Learning
 +
 + Our results highlight the need for new defenses against our hyperparameter stealing attacks for certain machine learning algorithms.
 +
 +https://​arxiv.org/​abs/​1802.04821 Evolved Policy Gradients
 +
 +We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent'​s experience. Because this loss is highly flexible in its ability to take into account the agent'​s history, it enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. Moreover, at test time, our learner optimizes only its learned loss function, and requires no explicit reward signal. In effect, the agent internalizes the reward structure, suggesting a direction toward agents that learn to solve new tasks simply from intrinsic motivation.
 +
 +https://​arxiv.org/​abs/​1711.00436v2 Hierarchical Representations for Efficient Architecture Search
 +
 +Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algorithm efficiently discovers architectures that outperform a large number of manually designed models for image classification,​ obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches. ​
 +
 +https://​arxiv.org/​abs/​1803.07055 Simple random search provides a competitive approach to reinforcement learning
 +
 +
 +https://​arxiv.org/​pdf/​1805.07440.pdf AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search
 +
 +AlphaX also generates the training date for Meta-DNN. So, the learning of Meta-DNN is end-to-end. In searching for NASNet style architectures,​ AlphaX found several promising architectures with up to 1% higher accuracy than NASNet using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original searching for NASNet that used 500 GPUs in 4 days
 +
 +https://​arxiv.org/​abs/​1808.05377 Neural Architecture Search: A Survey
 +
 +We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.
 +
 +https://​arxiv.org/​abs/​1809.04270 Rapid Training of Very Large Ensembles of Diverse Neural Networks
 +
 +Our approach captures the structural similarity between members of a neural network ensemble and train it only once. Subsequently,​ this knowledge is transferred to all members of the ensemble using function-preserving transformations. Then, these ensemble networks converge significantly faster as compared to training from scratch.
 +
 +https://​arxiv.org/​abs/​1810.05749v1 Graph HyperNetworks for Neural Architecture Search
 +
 +GHNs model the topology of an architecture and therefore can predict network performance more accurately than regular hypernetworks and premature early stopping. To perform NAS, we randomly sample architectures and use the validation accuracy of networks with GHN generated weights as the surrogate search signal. GHNs are fast -- they can search nearly 10 times faster than other random search methods on CIFAR-10 and ImageNet. ​
 +
 +https://​ai.googleblog.com/​2018/​10/​introducing-adanet-fast-and-flexible.html?​m=1