# Explanatory Patterns

**Note: This chapter is undergoing massive refactoring**

Neural networks are originally designed to perform classification well. Furthermore, these networks perform in a restricted domain (i.e. vision and speech recognition). Researchers have also been conveniently able to perform comparative benchmarks between their proposed research and earlier research. A majority of deep learning papers tends to be worthy of publication based on benchmark results. Unfortunately, in real world practice, the luxury of a comparative existing benchmark is difficult to find. Furthermore, many answers that people are seeking go beyond classification.

Recent trends have shown Deep Learning systems to be capable output that goes beyond classification. We have seen systems that take as input images and output captions that describe in text the contents of the images. There are systems that take audio and return the text transcript. Even more impressive is realtime translation between languages. Generative models are also an example of a system that takes a few parameters and are able to generate realistic images as well as human sounding speech.

In this chapter, we explore the different kinds of output such as recommendations and predictions that a neural network has delivered in practice. This will give the reader a better appreciation of the breadth of application of neural networks.

An Explanation relates a concept to a larger context that makes it “more understandable” to a another party. This differs from an evaluation, the primary difference being that evaluations will make use of some metric that for deciding whether one thing is better or worse than another.

Lower Dimensional Visualization - https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

Quantile Regression

Ranking

**References**

http://research.microsoft.com/pubs/192937/Transactions-APSIPA.pdf

http://arxiv.org/abs/1603.02518v2 A New Method to Visualize Deep Neural Networks

http://arxiv.org/pdf/1606.07112v1.pdf Visualizing Dynamics: from t-SNE to SEMI-MDPs

In this work we considered the problem of visualizing dynamics. Starting with a t-SNE map of the neural activations of a DQN and ending up with an SMDP model describing the underlying dynamics. We developed clustering algorithms that take into account the temporal aspects of the data and defined quantitative criteria to rank candidate SMDP models based on the likelihood of the data and an entropy simplicity term.

http://arxiv.org/abs/1606.06959v1 Dealing with a large number of classes – Likelihood, Discrimination or Ranking?

In contrast to recently introduced alternative approaches, a simple approximation of the the standard maximum likelihood objective provides an easily implementable and competitive method for fast large-class classification.

http://bluepiit.com/blog/classifying-recommender-systems/

http://www.bgu.ac.il/~shanigu/Publications/EvaluationMetrics.17.pdf Evaluating Recommendation Systems

http://arxiv.org/pdf/1606.07129v1.pdf Explainable Restricted Boltzmann Machines for Collaborative Filtering

In this paper, we focus on RBM based collaborative filtering recommendations, and further assume the absence of any additional data source, such as item content or user attributes. We thus propose a new Explainable RBM technique that computes the top-n recommendation list from items that are explainable.

https://www.quora.com/What-product-breakthroughs-will-recent-advances-in-deep-learning-enable

http://www.math.wm.edu/~leemis/2008amstat.pdf

http://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by-data-scientists

https://arxiv.org/abs/1606.08813 EU regulations on algorithmic decision-making and a “right to explanation”

We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which “significantly affect” users. The law will also create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination.

http://blog.mortardata.com/post/55692866435/recommender-systems-best-talks

https://pdfs.semanticscholar.org/4adc/c5d7429adcea3b365a223fac2441880cab28.pdf

Transparency (Tra.) Explain how the system works Scrutability (Scr.) Allow users to tell the system it is wrong Trust Increase users’ confidence in the system Effectiveness (Efk.) Help users make good decisions Persuasiveness (Pers.) Convince users to try or buy Efficiency (Efc.) Help users make decisions faster Satisfaction (Sat.) Increase the ease of usability or enjoyment

Recommendation Metrics: http://www.bgu.ac.il/~shanigu/Publications/EvaluationMetrics.17.pdf

https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/

http://arxiv.org/pdf/1607.04228v1.pdf Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks

http://arogozhnikov.github.io/2016/04/28/demonstrations-for-ml-courses.html

http://www.deeplearningpatterns.com/doku.php/model_interpretability

http://www.maths.bris.ac.uk/~madjl/finestructure/Lawson2012-GeneticSimilarityClustering.pdf

Similarity matrices and clustering algorithms for population identification using genetic data

Recent results show that the information used by both model-based clustering methods and Principal Components Analysis can be summarised by a matrix of pairwise similarity measures between individuals. Similarity matrices have been constructed in a number of ways, usually treating markers as independent but differing in the weighting given to polymorphisms of different frequencies.

We review several such matrices and evaluate their ‘information content’. A two-stage approach for population identification is to first construct a similarity matrix, and then perform clustering. We review a range of common clustering algorithms, and evaluate their performance through a simulation study. The clustering step can be performed either directly, or after using a dimension reduction technique such as Principal Components Analysis, which we find substantially improves the performance of most algorithms. Based on these results, we describe the population structure signal contained in each similarity matrix, finding that accounting for linkage leads to significant improvements for sequence data. We also perform a comparison on real data, where we find that population genetics models outperform generic clustering approaches, particularly in regards to robustness against features such as relatedness between individuals.

http://arxiv.org/pdf/1602.02867v1.pdf Value Iteration Networks

We introduce the value iteration network: a fully differentiable neural network with a ‘planning module’ embedded within. Value iteration networks are suitable for making predictions about outcomes that involve planning-based reasoning, such as predicting a desired trajectory from an observation of a map.

http://arxiv.org/abs/1603.08023 How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

http://arxiv.org/abs/1606.03490v2 The Mythos of Model Interpretability

https://auduno.github.io/2016/06/18/peeking-inside-convnets Peeking inside Convnets

Convolutional neural networks are used extensively for a number of image related tasks these days. Despite being very successful, they're mostly seen as “black box” models, since it's hard to understand what happens inside the network. There are however methods to “peek inside” the convnets, and thus understand a bit more about how they work.

https://ganguli-gang.stanford.edu/pdf/DeepKnowledgeTracing.pdf

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable

https://arxiv.org/abs/1609.07982v1 Optimistic and Pessimistic Neural Networks for Scene and Object Recognition

http://distill.pub/2016/misread-tsne/ How to Use t-SNE Effectively

https://arxiv.org/pdf/1606.04155v1.pdf Rationalizing Neural Predictions

Prediction without justification has limited applicability.
As a remedy, we learn to extract
pieces of input text as justifications – rationales
– that are tailored to be short and coherent,
yet sufficient for making the same prediction.
Our approach combines two modular
components, generator and encoder, which
are trained to operate well together. **The generator
specifies a distribution over text fragments
as candidate rationales and these are
passed through the encoder for prediction.** Rationales
are never given during training. Instead,
the model is regularized by desiderata
for rationales. We evaluate the approach on
multi-aspect sentiment analysis against manually
annotated test cases. Our approach outperforms
attention-based baseline by a significant
margin. We also successfully illustrate
the method on the question retrieval task.

http://openreview.net/pdf?id=SyVVJ85lg PALEO: A PERFORMANCE MODEL FOR DEEP NEURAL NETWORKS

http://openreview.net/pdf?id=ryF7rTqgl UNDERSTANDING INTERMEDIATE LAYERS USING LINEAR CLASSIFIER PROBES

https://arxiv.org/abs/1409.2944 Collaborative Deep Learning for Recommender Systems

To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix.

https://arxiv.org/abs/1605.01713 Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

https://www.youtube.com/watch?v=gjk0N5Qltfg Structured Prediction https://arxiv.org/abs/1605.07588 A Consistent Regularization Approach for Structured Prediction

https://arxiv.org/pdf/1603.06143v2.pdf Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks

http://www.darpa.mil/program/explainable-artificial-intelligence

https://arxiv.org/abs/1706.07269v1 Explanation in Artificial Intelligence: Insights from the Social Sciences

There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.

https://arxiv.org/abs/1706.08606v1 Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

Deep neural networks (DNNs) have advanced performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. While past work sought to advance our understanding of these models, none has made use of the rich history of problem descriptions, theories, and experimental methods developed by cognitive psychologists to study the human mind. To explore the potential value of these tools, we chose a well-established analysis from developmental psychology that explains how children learn word labels for objects, and applied that analysis to DNNs. Using datasets of stimuli inspired by the original cognitive psychology experiments, we find that state-of-the-art one shot learning models trained on ImageNet exhibit a similar bias to that observed in humans: they prefer to categorize objects according to shape rather than color. The magnitude of this shape bias varies greatly among architecturally identical, but differently seeded models, and even fluctuates within seeds throughout training, despite nearly equivalent classification performance. These results demonstrate the capability of tools from cognitive psychology for exposing hidden computational properties of DNNs, while concurrently providing us with a computational model for human word learning.

https://github.com/slundberg/shap A unified approach to explain the output of any machine learning model

https://arxiv.org/abs/1803.04765 Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations.

https://arxiv.org/abs/1804.03126v1 Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks