This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
exploration [2018/09/28 21:28]
exploration [2018/10/31 16:36] (current)
Line 80: Line 80:
 combined reward. We propose a new curiosity method which uses episodic memory combined reward. We propose a new curiosity method which uses episodic memory
 to form the novelty bonus to form the novelty bonus
 +https://​arxiv.org/​abs/​1810.06284 CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
 +This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore,​ we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.
 +https://​arxiv.org/​abs/​1810.12162 Model-Based Active Exploration
 + We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.