This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
recurrent_reinforcement_learning [2018/02/28 11:43]
recurrent_reinforcement_learning [2018/12/05 10:47] (current)
Line 19: Line 19:
 where simple distance metrics may no longer be effective. ​ where simple distance metrics may no longer be effective. ​
 +https://​arxiv.org/​pdf/​1803.02811.pdf Accelerated Methods for Deep Reinforcement Learning
 +We confirm that both policy gradient and
 +Q-value learning algorithms can be adapted to
 +learn using many parallel simulator instances. We
 +further find it possible to train using batch sizes
 +considerably larger than are standard, without
 +negatively affecting sample complexity or final
 +performance. We leverage these facts to build
 +a unified framework for parallelization that dramatically
 +hastens experiments in both classes of
 +algorithm. ​
 +https://​arxiv.org/​abs/​1808.10552 Directed Exploration in PAC Model-Free Reinforcement Learning
 +https://​arxiv.org/​abs/​1708.05866v2 A Brief Survey of Deep Reinforcement Learning
 +https://​papers.nips.cc/​paper/​8200-non-delusional-q-learning-and-value-iteration.pdf Non-delusional Q-learning and value iteration