Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
planning_to_learn [2018/01/14 10:19]
admin
planning_to_learn [2018/02/14 10:03] (current)
admin
Line 19: Line 19:
 https://​arxiv.org/​abs/​1801.03354 Planning with Pixels in (Almost) Real Time https://​arxiv.org/​abs/​1801.03354 Planning with Pixels in (Almost) Real Time
 Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time. Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.
 +
 +https://​openreview.net/​forum?​id=HJw8fAgA-&​noteId=HJw8fAgA Learning Dynamic State Abstractions for Model-Based Reinforcement Learning
 +
 +RL agents that use Monte-Carlo rollouts of these models as features for decision making outperform strong model-free baselines on the game MS_PACMAN, demonstrating the benefits of planning using learned dynamic state abstractions.