http://mlg.eng.cam.ac.uk/yarin/blog_2248.html Uncertainty in Deep Learning

https://arxiv.org/abs/1506.02142 Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

https://arxiv.org/abs/1511.02680 Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixel-wise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.

http://www.computervisionblog.com/2016/06/making-deep-networks-probabilistic-via.html

https://arxiv.org/pdf/1612.01474v1.pdf Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Our method uses scoring rules as training objectives to encourage the neural network to produce better calibrated predictions and uses a combination of ensembles and adversarial training for robustness to model misspecification and dataset shift. Our method is well suited for large scale distributed computation and can be readily implemented for a wide variety of architectures such as MLPs, CNNs, etc including those which do not use dropout (e.g. residual networks). It is perhaps surprising to the Bayesian deep learning community that a non-Bayesian (yet probabilistic) approach can perform as well as Bayesian neural networks. We hope that this work will encourage community to think about hybrid approaches (e.g. using non-Bayesian approaches such as ensembles) and other interesting metrics for evaluating predictive uncertainty.

Our contribution in this paper is two fold. First, we describe a simple, scalable method for estimating predictive uncertainty estimates from neural networks. We demonstrate that two simple modifications to the training pipeline, namely (i) ensembles and (ii) adversarial training [13], are sufficient to obtain well-calibrated uncertainty estimates for supervised learning. Secondly, we propose a series of tasks for evaluating the quality of the predictive uncertainty estimates, in terms of calibration and generalisation to unknowns in supervised learning problems.

https://www.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf Strictly Proper Scoring Rules, Prediction, and Estimation

https://arxiv.org/abs/1609.04468 Sampling Generative Networks

We introduce several techniques for sampling and visualizing the latent spaces of generative models. Replacing linear interpolation with spherical linear interpolation prevents diverging from a model's prior distribution and produces sharper samples. J-Diagrams and MINE grids are introduced as visualizations of manifolds created by analogies and nearest neighbors. We demonstrate two new techniques for deriving attribute vectors: bias-corrected vectors with data replication and synthetic vectors with data augmentation. Binary classification using attribute vectors is presented as a technique supporting quantitative analysis of the latent space. Most techniques are intended to be independent of model type and examples are shown on both Variational Autoencoders and Generative Adversarial Networks.

http://www.gatsby.ucl.ac.uk/~gretton/papers/testing_workshop.pdf Learning features to compare distributions

https://arxiv.org/abs/1701.04944v2 A Machine Learning Alternative to P-values

https://arxiv.org/pdf/1701.05226v1.pdf Reasoning in Non-Probabilistic Uncertainty Logic Programming and Neural-Symbolic Computing as Examples

This article aims to achieve two goals: to show that probability is not the only way of dealing with uncertainty (and even more, that there are kinds of uncertainty which are for principled reasons not addressable with probabilistic means); and to provide evidence that logicbased methods can well support reasoning with uncertainty. For the latter claim, two paradigmatic examples are presented: Logic Programming with Kleene semantics for modelling reasoning from information in a discourse, to an interpretation of the state of affairs of the intended model, and a neural-symbolic implementation of Input/Output logic for dealing with uncertainty in dynamic normative contexts.

https://www.microsoft.com/en-us/research/publication/identifying-unknown-unknowns-open-world-representations-policies-guided-exploration/ Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data. We propose a model agnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned by the predictive model, and then utilizes an explore-exploit strategy for discovering unknown unknowns across these partitions. We demonstrate the efficacy of our framework by varying the underlying causes of unknown unknowns across various applications. To the best of our knowledge, this paper presents the first algorithmic approach to the problem of discovering unknown unknowns of predictive models.

http://dustintran.com/blog/deep-and-hierarchical-implicit-models Deep and Hierarchical Implicit Models

Implicit probabilistic models are a very flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. They combine the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks. The use of implicit models with Bayesian analysis has in general been limited by our ability to perform accurate and scalable inference. We develop a variational inference algorithm for implicit models. Key to our method is specifying a variational family that is also implicit. This matches the model's flexibility and allows for accurate approximation of the posterior. Our method scales up implicit models to sizes previously not possible and opens the door to new modeling designs. We demonstrate diverse applications: a large-scale physical simulator for predator-prey populations in ecology; a Bayesian generative adversarial network for discrete data; and a deep implicit model for text generation.

http://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/

https://arxiv.org/pdf/1705.07115v1.pdf Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings.

https://arxiv.org/pdf/1703.04730.pdf Understanding Black-box Predictions via Influence Functions

How can we explain the predictions of a blackbox model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually indistinguishable training-set attacks. https://worksheets.codalab.org/worksheets/0x2b314dc3536b482dbba02783a24719fd/

https://eng.uber.com/neural-networks-uncertainty-estimation/

https://m.facebook.com/yann.lecun/posts/10154058859142143

https://arxiv.org/pdf/1709.02249.pdf Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling

https://arxiv.org/abs/1612.01474 Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Our method uses scoring rules as training objectives to encourage the neural network to produce better calibrated predictions and uses a combination of ensembles and adversarial training for robustness to model misspecification and dataset shift.

https://arxiv.org/abs/1710.04759 Bayesian Hypernetworks

In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of q(θ). We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.

https://arxiv.org/abs/1805.11783 To Trust Or Not To Trust A Classifier

We propose a new score, called the trust score, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier's confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis.

https://arxiv.org/abs/1812.10687 Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty