Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
game_theoretic_learning [2018/03/26 17:48]
admin
game_theoretic_learning [2019/01/07 19:09] (current)
admin
Line 54: Line 54:
 https://​arxiv.org/​abs/​1803.06579v1 A Multi-perspective Approach To Anomaly Detection For Self-aware Embodied Agents https://​arxiv.org/​abs/​1803.06579v1 A Multi-perspective Approach To Anomaly Detection For Self-aware Embodied Agents
  
 +https://​arxiv.org/​pdf/​1802.05642.pdf The Mechanics of n-Player Differentiable Games
  
 +https://​arxiv.org/​abs/​1805.02777v1 What game are we playing? End-to-end learning in normal and extensive form games
 +
 +https://​arxiv.org/​abs/​1409.5531 A mathematical theory of resources
 +
 +We prove some general theorems about how resource theories can be constructed from theories of processes wherein there is a special class of processes that are implementable at no cost and which define the means by which the costly states and processes can be interconverted one to another.
 +
 +https://​arxiv.org/​pdf/​1805.07440.pdf AlphaX: eXploring Neural Architectures with Deep
 +Neural Networks and Monte Carlo Tree Search
 +
 +AlphaX also generates the training date for Meta-DNN. So, the learning
 +of Meta-DNN is end-to-end. In searching for NASNet style architectures,​ AlphaX
 +found several promising architectures with up to 1% higher accuracy than NASNet
 +using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original
 +searching for NASNet that used 500 GPUs in 4 days.
 +
 +https://​arxiv.org/​abs/​1805.09613v1 A0C: Alpha Zero in Continuous Action Space
 +
 +. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.
 +
 +https://​arxiv.org/​abs/​1712.00679v2 GANGs: Generative Adversarial Network Games
 +
 + The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither G or C can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria'​ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations,​ including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. ​
 +
 +https://​arxiv.org/​abs/​1804.06500v2 Two-Player Games for Efficient Non-Convex Constrained Optimization
 +
 +The Lagrangian can be interpreted as a two-player game played between a player who seeks to optimize over the model parameters, and a player who wishes to maximize over the Lagrange multipliers. We propose a non-zero-sum variant of the Lagrangian formulation that can cope with non-differentiable--even discontinuous--constraints,​ which we call the "​proxy-Lagrangian"​. The first player minimizes external regret in terms of easy-to-optimize "proxy constraints",​ while the second player enforces the original constraints by minimizing swap regret. ​
 +For this new formulation,​ as for the Lagrangian in the non-convex setting, the result is a stochastic classifier. For both the proxy-Lagrangian and Lagrangian formulations,​ however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints). This is a significant improvement in practical terms. ​ https://​github.com/​tensorflow/​tensorflow/​tree/​r1.10/​tensorflow/​contrib/​constrained_optimization
 +
 +https://​arxiv.org/​pdf/​1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
 +
 +https://​arxiv.org/​abs/​1811.08469 Stable Opponent Shaping in Differentiable Games