Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
game_theoretic_learning [2018/10/03 19:55]
admin
game_theoretic_learning [2019/01/07 19:09]
admin
Line 82: Line 82:
  
 The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​ The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​
-For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms.+For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ ​https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization 
 + 
 +https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning 
 + 
 +https://​arxiv.org/​abs/​1811.08469 Stable Opponent Shaping in Differentiable Games