Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
replay [2017/07/08 13:55]
127.0.0.1 external edit
replay [2018/10/02 20:24]
admin
Line 9: Line 9:
 Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. ​ Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. ​
  
 +https://​openreview.net/​forum?​id=r1lyTjAqYX Recurrent Experience Replay in Distributed Reinforcement Learning ​
  
 +We investigate the effects of parameter lag resulting in representational drift and recurrent state staleness and empirically derive an improved training strategy. ​
 +
 +https://​arxiv.org/​abs/​1809.10635v1 Generative replay with feedback connections as a general strategy for continual learning
 +
 +Standard artificial neural networks suffer from the well-known issue of catastrophic
 +forgetting, making continual or lifelong learning problematic. Recently, numerous
 +methods have been proposed for continual learning, but due to differences in
 +evaluation protocols it is difficult to directly compare their performance. To enable
 +more meaningful comparisons,​ we identified three distinct continual learning
 +scenarios based on whether task identity is known and, if it is not, whether it needs
 +to be inferred. Performing the split and permuted MNIST task protocols according
 +to each of these scenarios, we found that regularization-based approaches (e.g.,
 +elastic weight consolidation) failed when task identity needed to be inferred. In
 +contrast, generative replay combined with distillation (i.e., using class probabilities
 +as “soft targets”) achieved superior performance in all three scenarios. In addition,
 +we reduced the computational cost of generative replay by integrating the generative
 +model into the main model by equipping it with generative feedback connections.
 +This Replay-through-Feedback approach substantially shortened training time with
 +no or negligible loss in performance. We believe this to be an important first step
 +towards making the powerful technique of generative replay scalable to real-world
 +continual learning applications.