https://arxiv.org/pdf/1703.00443.pdf OptNet: Differentiable Optimization as a Layer in Neural Networks

We build a framework where the output of the i + 1th layer in a network is the solution to a constrained optimization problem based upon previous layers.

A network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers allow complex dependencies between the hidden states to be captured that traditional convolutional and fully-connected layers are not able to capture.

https://arxiv.org/abs/1703.05002v2 Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths

Zero-shot recognition aims to accurately recognize objects of unseen classes by using a shared visual-semantic mapping between the image feature space and the semantic embedding space. This mapping is learned on training data of seen classes and is expected to have transfer ability to unseen classes. In this paper, we tackle this problem by exploiting the intrinsic relationship between the semantic space manifold and the transfer ability of visual-semantic mapping. We formalize their connection and cast zero-shot recognition as a joint optimization problem. Motivated by this, we propose a novel framework for zero-shot recognition, which contains dual visual-semantic mapping paths. Our analysis shows this framework can not only apply prior semantic knowledge to infer underlying semantic manifold in the image feature space, but also generate optimized semantic embedding space, which can enhance the transfer ability of the visual-semantic mapping to unseen classes. The proposed method is evaluated for zero-shot recognition on four benchmark datasets, achieving outstanding results.

https://research.googleblog.com/2017/04/federated-learning-collaborative.html?m=1 Federated Learning: Collaborative Machine Learning without Centralized Training Data

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well.

https://arxiv.org/abs/1802.03685v1 Learning a SAT Solver from Single-Bit Supervision

We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability. Although it is not competitive with state-of-the-art SAT solvers, NeuroSAT can solve problems that are substantially larger and more difficult than it ever saw during training by simply running for more iterations. Moreover, NeuroSAT generalizes to novel distributions; after training only on random SAT problems, at test time it can solve SAT problems encoding graph coloring, clique detection, dominating set, and vertex cover problems, all on a range of distributions over small random graphs.

https://arxiv.org/abs/1802.03676v2 Differentiable Dynamic Programming for Structured Prediction and Attention

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

https://arxiv.org/abs/1704.07183v1 Stochastic Constraint Programming as Reinforcement Learning

https://arxiv.org/abs/1510.03317v1 The Inductive Constraint Programming Loop

https://arxiv.org/abs/1802.03676v2Differentiable Dynamic Programming for Structured Prediction and Attention

we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators.

https://arxiv.org/abs/1807.01672v1 Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

https://arxiv.org/abs/1807.04676v1 A Library for Constraint Consistent Learning

http://proceedings.mlr.press/v80/rosenfeld18a.html Learning to Optimize Combinatorial Functions

https://arxiv.org/abs/1804.06355v1 An Exponential Speedup in Parallel Running Time for Submodular Maximization without Loss in Approximation

https://arxiv.org/abs/1810.10659 Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

https://arxiv.org/abs/1811.06128 Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon

This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art methodologies involve algorithmic decisions that either require too much computing time or are not mathematically well defined. Thus, machine learning looks like a promising candidate to effectively deal with those decisions. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.