Differences
This shows you the differences between two versions of the page.
— |
black_box_optimization [2017/03/23 23:57] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | https://arxiv.org/pdf/1703.04529.pdf Task-based End-to-end Model Learning | ||
+ | |||
+ | As machine learning techniques have become | ||
+ | more ubiquitous, it has become common to see | ||
+ | machine learning prediction algorithms operating | ||
+ | within some larger process. However, the | ||
+ | criteria by which we train machine learning algorithms | ||
+ | often differ from the ultimate criteria | ||
+ | on which we evaluate them. This paper proposes | ||
+ | an end-to-end approach for learning probabilistic | ||
+ | machine learning models within the context | ||
+ | of stochastic programming, in a manner that directly | ||
+ | captures the ultimate task-based objective | ||
+ | for which they will be used. We then present | ||
+ | two experimental evaluations of the proposed approach, | ||
+ | one as applied to a generic inventory | ||
+ | stock problem and the second to a real-world | ||
+ | electrical grid scheduling task. In both cases, | ||
+ | we show that the proposed approach can outperform | ||
+ | both a traditional modeling approach and a | ||
+ | purely black-box policy optimization approach. | ||
+ | |||
+ | https://arxiv.org/abs/1611.03824 Learning to Learn for Global Optimization of Black Box Functions | ||
+ | |||
+ | We present a learning to learn approach for training recurrent neural networks to perform black-box global optimization. In the meta-learning phase we use a large set of smooth target functions to learn a recurrent neural network (RNN) optimizer, which is either a long-short term memory network or a differentiable neural computer. After learning, the RNN can be applied to learn policies in reinforcement learning, as well as other black-box learning tasks, including continuous correlated bandits and experimental design. We compare this approach to Bayesian optimization, with emphasis on the issues of computation speed, horizon length, and exploration-exploitation trade-offs. | ||
+ | |||
+ | https://arxiv.org/abs/1606.03152 Policy Networks with Two-Stage Training for Dialogue Systems | ||
+ | |||
+ | First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. | ||
+ | |||
+ | We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably bootstrapped from a combination of supervised and batch RL. |