Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
exploration [2018/08/23 21:23]
admin
exploration [2018/10/31 16:36] (current)
admin
Line 70: Line 70:
 data. In that spectrum, anomaly detection is the easier task, giving more focus at the difficulty data. In that spectrum, anomaly detection is the easier task, giving more focus at the difficulty
 of novelty detection. of novelty detection.
 +
 +https://​openreview.net/​forum?​id=SkeK3s0qKQ EPISODIC CURIOSITY THROUGH REACHABILITY
 +
 +y. One solution to this problem is to allow the
 +agent to create rewards for itself — thus making rewards dense and more suitable
 +for learning. In particular, inspired by curious behaviour in animals, observing
 +something novel could be rewarded with a bonus. Such bonus is summed up with
 +the real task reward — making it possible for RL algorithms to learn from the
 +combined reward. We propose a new curiosity method which uses episodic memory
 +to form the novelty bonus
 +
 +https://​arxiv.org/​abs/​1810.06284 CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
 +
 +This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore,​ we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.
 +
 +https://​arxiv.org/​abs/​1810.12162 Model-Based Active Exploration
 +
 + We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.