https://arxiv.org/abs/1705.02670v1 Metacontrol for Adaptive Imagination-Based Optimization

Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of “imagined” internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution.