https://www.microsoft.com/en-us/research/wp-content/uploads/2017/04/williams2017acl.pdf Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

https://arxiv.org/pdf/1705.08439v1.pdf Thinking Fast and Slow with Deep Learning and Tree Search

Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84 .4% of games against it when trained for equal time.

https://arxiv.org/pdf/1709.06977.pdf Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks

https://arxiv.org/pdf/1712.01208.pdf The Case for Learned Index Structures