This is an old revision of the document!


Curiosity

https://arxiv.org/abs/1611.04717v1 #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

https://arxiv.org/abs/1606.01868v2 Unifying Count-Based Exploration and Intrinsic Motivation https://deepmind.com/blog/deepmind-papers-nips-part-3/

VIME Houthooft

https://www.technologyreview.com/s/603366/mathematical-model-reveals-the-patterns-of-how-innovations-arise/

https://arxiv.org/abs/1611.09321v2 Improving Policy Gradient by Exploring Under-appreciated Rewards

This paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward landscape, which is ineffective in high dimensional spaces with sparse rewards. We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions. An action sequence is considered under-appreciated if its log-probability under the current policy under-estimates its resulting reward. The proposed exploration strategy is easy to implement, requiring small modifications to an implementation of the REINFORCE algorithm. We evaluate the approach on a set of algorithmic tasks that have long challenged RL methods. Our approach reduces hyper-parameter sensitivity and demonstrates significant improvements over baseline methods. Our algorithm successfully solves a benchmark multi-digit addition task and generalizes to long sequences. This is, to our knowledge, the first time that a pure RL method has solved addition using only reward feedback.

https://arxiv.org/pdf/1612.02605.pdf TOWARDS INFORMATION-SEEKING AGENTS

https://arxiv.org/abs/1705.05363v1 Curiosity-driven Exploration by Self-supervised Prediction

https://arxiv.org/abs/1706.10295 Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and ϵ-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

http://antoniosliapis.com/papers/coupling_novelty_and_surprise_for_evolutionary_divergence.pdf Coupling Novelty and Surprise for Evolutionary Divergence

As novelty and surprise search have already shown much promise individually, the hypothesis is that an evolutionary process that rewards both novel and surprising solutions will be able to handle deception in a beŠer fashion and lead to more successful solutions faster. In this paper we introduce an algorithm that realises both novelty and surprise search and we compare it against the two algorithms that compose it in a number of robot navigation tasks.

https://arxiv.org/pdf/1706.10295v1.pdf Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and -greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

http://bica2017.bicasociety.org/wp-content/uploads/2017/08/BICA_2017_paper_89.pdf A Robust Cognitive Architecture for Learning from Surprises

https://arxiv.org/abs/1710.11089 Eigenoption Discovery through the Deep Successor Representation

http://www.marcgbellemare.info/static/publications/ostrovski17countbased.pdf Count-Based Exploration with Neural Density Models

https://arxiv.org/abs/1705.05363 Curiosity-driven Exploration by Self-supervised Prediction

We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent.