This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
exploration [2017/08/09 02:10] external edit
exploration [2018/10/17 00:03]
Line 48: Line 48:
 Surprises Surprises
 +https://​arxiv.org/​abs/​1710.11089 Eigenoption Discovery through the Deep Successor Representation
 +http://​www.marcgbellemare.info/​static/​publications/​ostrovski17countbased.pdf Count-Based Exploration with Neural Density Models
 +https://​arxiv.org/​abs/​1705.05363 Curiosity-driven Exploration by Self-supervised Prediction
 +We formulate curiosity as the error in an agent'​s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. ​ https://​github.com/​pathak22/​noreward-rl
 +https://​arxiv.org/​pdf/​1802.10546.pdf Computational Theories of Curiosity-Driven Learning
 +https://​arxiv.org/​abs/​1806.06505v1 A unified strategy for implementing curiosity and empowerment driven reinforcement learning
 +https://​arxiv.org/​abs/​1808.05492v1 Metric Learning for Novelty and Anomaly Detection
 +We show that metric learning
 +provides a better output embedding space to detect data outside the learned distribution
 +than cross-entropy softmax based models. This opens an opportunity to further research on
 +how this embedding space should be learned, with restrictions that could further improve the
 +field. The presented results suggest that out-of-distribution data might not all be seen as a
 +single type of anomaly, but instead a continuous representation between novelty and anomaly
 +data. In that spectrum, anomaly detection is the easier task, giving more focus at the difficulty
 +of novelty detection.
 +https://​openreview.net/​forum?​id=SkeK3s0qKQ EPISODIC CURIOSITY THROUGH REACHABILITY
 +y. One solution to this problem is to allow the
 +agent to create rewards for itself — thus making rewards dense and more suitable
 +for learning. In particular, inspired by curious behaviour in animals, observing
 +something novel could be rewarded with a bonus. Such bonus is summed up with
 +the real task reward — making it possible for RL algorithms to learn from the
 +combined reward. We propose a new curiosity method which uses episodic memory
 +to form the novelty bonus
 +https://​arxiv.org/​abs/​1810.06284 CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
 +This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore,​ we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.