Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
exploration [2018/03/05 07:19]
admin
exploration [2018/10/31 16:36] (current)
admin
Line 55: Line 55:
  
 We formulate curiosity as the error in an agent'​s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. ​ https://​github.com/​pathak22/​noreward-rl We formulate curiosity as the error in an agent'​s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. ​ https://​github.com/​pathak22/​noreward-rl
 +
 +https://​arxiv.org/​pdf/​1802.10546.pdf Computational Theories of Curiosity-Driven Learning
 +
 +https://​arxiv.org/​abs/​1806.06505v1 A unified strategy for implementing curiosity and empowerment driven reinforcement learning
 +
 +https://​arxiv.org/​abs/​1808.05492v1 Metric Learning for Novelty and Anomaly Detection
 +
 +We show that metric learning
 +provides a better output embedding space to detect data outside the learned distribution
 +than cross-entropy softmax based models. This opens an opportunity to further research on
 +how this embedding space should be learned, with restrictions that could further improve the
 +field. The presented results suggest that out-of-distribution data might not all be seen as a
 +single type of anomaly, but instead a continuous representation between novelty and anomaly
 +data. In that spectrum, anomaly detection is the easier task, giving more focus at the difficulty
 +of novelty detection.
 +
 +https://​openreview.net/​forum?​id=SkeK3s0qKQ EPISODIC CURIOSITY THROUGH REACHABILITY
 +
 +y. One solution to this problem is to allow the
 +agent to create rewards for itself — thus making rewards dense and more suitable
 +for learning. In particular, inspired by curious behaviour in animals, observing
 +something novel could be rewarded with a bonus. Such bonus is summed up with
 +the real task reward — making it possible for RL algorithms to learn from the
 +combined reward. We propose a new curiosity method which uses episodic memory
 +to form the novelty bonus
 +
 +https://​arxiv.org/​abs/​1810.06284 CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
 +
 +This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore,​ we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.
 +
 +https://​arxiv.org/​abs/​1810.12162 Model-Based Active Exploration
 +
 + We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.