Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
random_orthogonal_initialization [2018/03/05 18:59]
admin
random_orthogonal_initialization [2018/07/01 05:32]
admin
Line 365: Line 365:
 that remain well-conditioned as the depth goes to that remain well-conditioned as the depth goes to
 infinity, as well as theoretical conditions for their existence. infinity, as well as theoretical conditions for their existence.
 +
 +https://​arxiv.org/​abs/​1804.03758 Universal Successor Representations for Transfer Reinforcement Learning
 +
 +The objective of transfer reinforcement learning is to generalize from a set of previous tasks to unseen new tasks. In this work, we focus on the transfer scenario where the dynamics among tasks are the same, but their goals differ. Although general value function (Sutton et al., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice. To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment. Our experiments show that USR can be effectively applied to new tasks, and the agent initialized by the trained USRA can achieve the goal considerably faster than random initialization.
 +
 +https://​arxiv.org/​abs/​1806.10909 ResNet with one-neuron hidden layers is a Universal Approximator