Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
game_theoretic_learning [2018/05/30 15:26]
admin
game_theoretic_learning [2019/01/07 19:09] (current)
admin
Line 74: Line 74:
  
 . This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space. . This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.
 +
 +https://​arxiv.org/​abs/​1712.00679v2 GANGs: Generative Adversarial Network Games
 +
 + The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. ​
 +
 +https://​arxiv.org/​abs/​1804.06500v2 Two-Player Games for Efficient Non-Convex Constrained Optimization
 +
 +The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​
 +For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization
 +
 +https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
 +
 +https://​arxiv.org/​abs/​1811.08469 Stable Opponent Shaping in Differentiable Games
 +