**This is an old revision of the document!**

https://arxiv.org/abs/1606.04155 Rationalizing Neural Predictions

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications – rationales – that are tailored to be short and coherent, yet sufficient for making the same prediction. **Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction.** Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.

Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the complexity. In order for users to construct better mental models and understand complex data distributions, we also need criticism to explain what are not captured by prototypes. Motivated by the Bayesian model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines.

https://arxiv.org/pdf/1612.04757v1.pdf Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can do this as well and propose our Pointing and Justification (PJ-X) model which can justify its decision with a sentence and point to the evidence by introspecting its decision and explanation process using an attention mechanism. Unfortunately there is no dataset available with reference explanations for visual decision making. We thus collect two datasets in two domains where it is interesting and challenging to explain decisions. First, we extend the visual question answering task to not only provide an answer but also a natural language explanation for the answer. Second, we focus on explaining human activities which is traditionally more challenging than object classification. We extensively evaluate our PJ-X model, both on the justification and pointing tasks, by comparing it to prior models and ablations using both automatic and human evaluations.

https://arxiv.org/abs/1612.07843 “What is Relevant in a Text Document?”: An Interpretable Machine Learning Approach

In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information.

https://www.quantamagazine.org/20150723-computer-explanation/

http://www.scottreed.info/files/iclr2017.pdf GENERATING INTERPRETABLE IMAGES WITH CONTROLLABLE STRUCTURE

https://arxiv.org/abs/1612.08994v1 Here's My Point: Argumentation Mining with Pointer Networks

This work provides the first neural network-based approach to argumentation mining, focusing on the two tasks of extracting links between argument components, and classifying types of argument components. In order to solve this problem, we propose to use a joint model that is based on a Pointer Network architecture. A Pointer Network is appealing for this task for the following reasons: 1) It takes into account the sequential nature of argument components; 2) By construction, it enforces certain properties of the tree structure present in argument relations; 3) The hidden representations can be applied to auxiliary tasks. In order to extend the contribution of the original Pointer Network model, we construct a joint model that simultaneously attempts to learn the type of argument component, as well as continuing to predict links between argument components. The proposed joint model achieves state-of-the-art results on two separate evaluation corpora, achieving far superior performance than a regular Pointer Network model. Our results show that optimizing for both tasks, and adding a fully-connected layer prior to recurrent neural network input, is crucial for high performance.

https://arxiv.org/pdf/1612.08220v2.pdf Understanding Neural Networks through Representation Erasure

While neural networks have been successfully applied to many natural language processing tasks, they come at the cost of interpretability. In this paper, we propose a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words. We present several approaches to analyzing the effects of such erasure, from computing the relative difference in evaluation metrics, to using reinforcement learning to erase the minimum set of input words in order to flip a neural model’s decision. In a comprehensive analysis of multiple NLP tasks, including linguistic feature classification, sentence-level sentiment analysis, and document level sentiment aspect prediction, we show that the proposed methodology not only offers clear explanations about neural model decisions, but also provides a way to conduct error analysis on neural models.

https://arxiv.org/abs/1702.07826v1 Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda.

https://arxiv.org/pdf/1512.02479v1.pdf Explaining NonLinear Classification Decisions with Deep Taylor Decomposition

In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method is based on deep Taylor decomposition and efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer.

https://arxiv.org/pdf/1705.03633v1.pdf Inferring and Executing Programs for Visual Reasoning

Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. https://github.com/facebookresearch/clevr-iep

https://arxiv.org/abs/1705.04146 Program Induction by Rationale Generation:Learning to Solve and Explain Algebraic Word Problems

Solving algebraic word problems requires executing a series of arithmetic operations—a program—to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.

https://arxiv.org/abs/1707.01561v1 Automatic Generation of Natural Language Explanations

In this paper, we propose a method for the automatic generation of natural language explanations, for predicting how a user would write about an item, based on user ratings from different items' features. We design a character-level recurrent neural network (RNN) model, which generates an item's review explanations using long-short term memories (LSTM). The model generates text reviews given a combination of the review and ratings score that express opinions about different factors or aspects of an item. Our network is trained on a sub-sample from the large real-world dataset BeerAdvocate. Our empirical evaluation using natural language processing metrics shows the generated text's quality is close to a real user written review, identifying negation, misspellings, and domain specific vocabulary.

https://arxiv.org/abs/1707.05501 Story Generation from Sequence of Independent Short Descriptions

This paper introduces and addresses the task of coherent story generation from independent descriptions, describing a scene or an event. Towards this, we explore along two popular text-generation paradigms – (1) Statistical Machine Translation (SMT), posing story generation as a translation problem and (2) Deep Learning, posing story generation as a sequence to sequence learning problem.

https://arxiv.org/pdf/1708.07476v1.pdf M2D: Monolog to Dialog Generation for Conversational Story Telling

https://nlds.soe.ucsc.edu/personabank We present a new corpus, PersonaBank, consisting of 108 personal stories from weblogs that have been annotated with their Story Intention Graphs, a deep representation of the fabula of a story.

https://arxiv.org/abs/1708.08296 Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables.

https://arxiv.org/abs/1708.08573 Generating Different Story Tellings from Semantic Representations of Narrative

n this paper we present an automatic method for converting from Scheherazade's story intention graph, a semantic representation, to the input required by the Personage NLG engine. Using 36 Aesop Fables distributed in DramaBank, a collection of story encodings, we train translation rules on one story and then test these rules by generating text for the remaining 35. The results are measured in terms of the string similarity metrics Levenshtein Distance and BLEU score. The results show that we can generate the 35 stories with correct content: the test set stories on average are close to the output of the Scheherazade realizer, which was customized to this semantic representation. We provide some examples of story variations generated by personage. In future work, we will experiment with measuring the quality of the same stories generated in different voices, and with techniques for making storytelling interactive.

https://www.microsoft.com/en-us/research/video/understanding-black-box-predictions-via-influence-functions/ Understanding Black-box Predictions via Influence Functions

https://www.youtube.com/watch?v=0w9fLX_T6tY&feature=youtu.be

https://arxiv.org/abs/1711.09784 Distilling a Neural Network Into a Soft Decision Tree

https://arxiv.org/abs/1702.07826v2 Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We evaluate our technique in the Frogger game environment, training an autonomous game playing agent to rationalize its action choices using natural language.

https://arxiv.org/abs/1801.09848v1 Over-representation of Extreme Events in Decision-Making: A Rational Metacognitive Account

To our knowledge, our model is the first metacognitive, resource-rational process model of cognitive biases in decision-making.

https://arxiv.org/abs/1802.04346 Hybrid Decision Making: When Interpretable Models Collaborate With Black-Box Models

We propose a novel metric, explainability, to measure the percentage of data that are sent to the interpretable model for decision. We also design a principled objective function that considers predictive accuracy, model interpretability, and data explainability. Under this framework, we develop Collaborative Black-box and RUle Set Hybrid (CoBRUSH) model that combines logic rules and any black-box model into a joint decision model. An input instance is first sent to the rules for decision. If a rule is satisfied, a decision will be directly generated. Otherwise, the black-box model is activated to decide on the instance. To train a hybrid model, we design an efficient search algorithm that exploits theoretically grounded strategies to reduce computation. Experiments show that CoBRUSH models are able to achieve same or better accuracy than their black-box collaborator working alone while gaining explainability.

https://github.com/slundberg/shap

https://github.com/davidmascharka/tbd-nets Transparency-by-Design networks (TbD-nets)

https://arxiv.org/abs/1802.08129 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches.

https://arxiv.org/pdf/1804.02086v1.pdf Hierarchical Disentangled Representations

Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation, often by introducing suitable modifications of the objective function. We synthesize this growing body of literature by formulating a generalization of the evidence lower bound that explicitly represents the trade-offs between sparsity of the latent code, bijectivity of representations, and coverage of the support of the empirical data distribution. Our objective is also suitable to learning hierarchical representations that disentangle blocks of variables whilst allowing for some degree of correlations within blocks. Experiments on a range of datasets demonstrate that learned representations contain interpretable features, are able to learn discrete attributes, and generalize to unseen combinations of factors.

https://arxiv.org/abs/1803.07517v2 Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

https://christophm.github.io/interpretable-ml-book/

https://arxiv.org/abs/1807.08556v1 Explainable Neural Computation via Stack Neural Module Networks

https://arxiv.org/abs/1808.00196 Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models

https://arxiv.org/abs/1702.07826 Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

https://www.arxiv-vanity.com/papers/1806.01933/ Explainable Neural Networks based on Additive Index Models

n this paper, we present the Explainable Neural Network (xNN), a structured neural network designed especially to learn interpretable features. Unlike fully connected neural networks, the features engineered by the xNN can be extracted from the network in a relatively straightforward manner and the results displayed. With appropriate regularization, the xNN provides a parsimonious explanation of the relationship between the features and the output.

nlike commonly used neural network structures, the structure of the xNN describes the features it learns, via linear projections and univariate functions. These explainability features have the attractive feature of being additive in nature and straightforward to interpret. Whether the network is used as a primary model or a surrogate for a more complex model, the xNN provides straightforward explanations of how the model uses the input features to make predictions.