Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
recurrent_reinforcement_learning [2017/03/17 20:10]
127.0.0.1 external edit
recurrent_reinforcement_learning [2018/12/05 10:47] (current)
admin
Line 2: Line 2:
  
 Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent'​s entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods. Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent'​s entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.
 +
 +https://​openreview.net/​pdf?​id=Skw0n-W0Z TEMPORAL DIFFERENCE MODELS:
 +MODEL-FREE DEEP RL FOR MODEL-BASED CONTROL
 +
 +Our temporal difference models can
 +be viewed both as goal-conditioned value functions and implicit dynamics models, which enables
 +them to be trained efficiently on off-policy data while still minimizing the effects of model bias. As
 +a result, they achieve asymptotic performance that compares favorably with model-free algorithms,
 +but with a sample complexity that is comparable to purely model-based methods.
 +While the experiments focus primarily on the new RL algorithm, the relationship between modelbased
 +and model-free RL explored in this paper provides a number of avenues for future work.
 +We demonstrated the use of TDMs with a very basic planning approach, but further exploring how
 +TDMs can be incorporated into powerful constrained optimization methods for model-predictive
 +control or trajectory optimization is an exciting avenue for future work. Another direction for future
 +is to further explore how TDMs can be applied to complex state representations,​ such as images,
 +where simple distance metrics may no longer be effective. ​
 +
 +https://​arxiv.org/​pdf/​1803.02811.pdf Accelerated Methods for Deep Reinforcement Learning
 +
 +We confirm that both policy gradient and
 +Q-value learning algorithms can be adapted to
 +learn using many parallel simulator instances. We
 +further find it possible to train using batch sizes
 +considerably larger than are standard, without
 +negatively affecting sample complexity or final
 +performance. We leverage these facts to build
 +a unified framework for parallelization that dramatically
 +hastens experiments in both classes of
 +algorithm. ​
 +
 +https://​arxiv.org/​abs/​1808.10552 Directed Exploration in PAC Model-Free Reinforcement Learning
 +
 +https://​arxiv.org/​abs/​1708.05866v2 A Brief Survey of Deep Reinforcement Learning
 +
 +https://​papers.nips.cc/​paper/​8200-non-delusional-q-learning-and-value-iteration.pdf Non-delusional Q-learning and value iteration