Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
game_theoretic_learning [2018/05/22 09:45]
admin
game_theoretic_learning [2019/01/07 19:09] (current)
admin
Line 69: Line 69:
 found several promising architectures with up to 1% higher accuracy than NASNet found several promising architectures with up to 1% higher accuracy than NASNet
 using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original
-searching for NASNet that used 500 GPUs in 4 days+searching for NASNet that used 500 GPUs in 4 days
 + 
 +https://​arxiv.org/​abs/​1805.09613v1 A0C: Alpha Zero in Continuous Action Space 
 + 
 +. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space. 
 + 
 +https://​arxiv.org/​abs/​1712.00679v2 GANGs: Generative Adversarial Network Games 
 + 
 + The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE.  
 + 
 +https://​arxiv.org/​abs/​1804.06500v2 Two-Player Games for Efficient Non-Convex Constrained Optimization 
 + 
 +The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret.  
 +For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization 
 + 
 +https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning 
 + 
 +https://​arxiv.org/​abs/​1811.08469 Stable Opponent Shaping in Differentiable Games