# Differences

This shows you the differences between two versions of the page.

 game_theoretic_learning [2018/07/24 21:47]admin game_theoretic_learning [2019/01/07 19:09]admin 2019/01/07 19:09 admin 2018/10/04 12:01 admin 2018/10/03 19:56 admin 2018/10/03 19:55 admin 2018/07/24 21:47 admin 2018/05/30 15:26 admin 2018/05/22 09:45 admin 2018/05/18 10:38 admin 2018/05/09 01:12 admin 2018/05/09 01:11 admin 2018/03/26 17:48 admin 2018/02/22 20:44 admin 2018/02/22 17:04 admin 2018/02/22 17:00 admin 2018/01/30 20:16 admin 2018/01/30 20:15 admin 2017/06/11 17:05 external edit 2019/01/07 19:09 admin 2018/10/04 12:01 admin 2018/10/03 19:56 admin 2018/10/03 19:55 admin 2018/07/24 21:47 admin 2018/05/30 15:26 admin 2018/05/22 09:45 admin 2018/05/18 10:38 admin 2018/05/09 01:12 admin 2018/05/09 01:11 admin 2018/03/26 17:48 admin 2018/02/22 20:44 admin 2018/02/22 17:04 admin 2018/02/22 17:00 admin 2018/01/30 20:16 admin 2018/01/30 20:15 admin 2017/06/11 17:05 external edit Line 78: Line 78: The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. ​ The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. ​ + + https://​arxiv.org/​abs/​1804.06500v2 Two-Player Games for Efficient Non-Convex Constrained Optimization + + The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​ + For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization + + https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning + + https://​arxiv.org/​abs/​1811.08469 Stable Opponent Shaping in Differentiable Games +