This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
game_theoretic_learning [2018/05/09 01:11]
game_theoretic_learning [2018/10/04 12:01]
Line 54: Line 54:
 https://​arxiv.org/​abs/​1803.06579v1 A Multi-perspective Approach To Anomaly Detection For Self-aware Embodied Agents https://​arxiv.org/​abs/​1803.06579v1 A Multi-perspective Approach To Anomaly Detection For Self-aware Embodied Agents
 +https://​arxiv.org/​pdf/​1802.05642.pdf The Mechanics of n-Player Differentiable Games
 https://​arxiv.org/​abs/​1805.02777v1 What game are we playing? End-to-end learning in normal and extensive form games https://​arxiv.org/​abs/​1805.02777v1 What game are we playing? End-to-end learning in normal and extensive form games
 +https://​arxiv.org/​abs/​1409.5531 A mathematical theory of resources
 +We prove some general theorems about how resource theories can be constructed from theories of processes wherein there is a special class of processes that are implementable at no cost and which define the means by which the costly states and processes can be interconverted one to another.
 +https://​arxiv.org/​pdf/​1805.07440.pdf AlphaX: eXploring Neural Architectures with Deep
 +Neural Networks and Monte Carlo Tree Search
 +AlphaX also generates the training date for Meta-DNN. So, the learning
 +of Meta-DNN is end-to-end. In searching for NASNet style architectures,​ AlphaX
 +found several promising architectures with up to 1% higher accuracy than NASNet
 +using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original
 +searching for NASNet that used 500 GPUs in 4 days.
 +https://​arxiv.org/​abs/​1805.09613v1 A0C: Alpha Zero in Continuous Action Space
 +. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.
 +https://​arxiv.org/​abs/​1712.00679v2 GANGs: Generative Adversarial Network Games
 + The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. ​
 +https://​arxiv.org/​abs/​1804.06500v2 Two-Player Games for Efficient Non-Convex Constrained Optimization
 +The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​
 +For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization
 +https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning