**This is an old revision of the document!**

**References**

http://arxiv.org/pdf/1606.03126v1.pdf Key-Value Memory Networks for Directly Reading Documents

http://arxiv.org/pdf/1301.3618v2.pdf Neural Tensor Networks

http://arxiv.org/abs/1606.08660v2 Theory reconstruction: a representation learning view on predicate invention

https://arxiv.org/pdf/1608.00318v1.pdf A Neural Knowledge Language Model

we propose a Neural Knowledge Language Model (NKLM) which combines symbolic knowledge provided by knowledge graphs with RNN language models. At each time step, the model predicts a fact on which the observed word is supposed to be based. Then, a word is either generated from the vocabulary or copied from the knowledge graph.

https://blog.acolyer.org/2016/10/12/towards-deep-symbolic-reinforcement-learning/ Towards deep symbolic reinforcement learning

https://arxiv.org/pdf/1608.00318v1.pdf A Neural Knowledge Language Model

we propose a Neural Knowledge Language Model (NKLM) which combines symbolic knowledge provided by knowledge graphs with RNN language models. At each time step, the model predicts a fact on which the observed word is supposed to be based. Then, a word is either generated from the vocabulary or copied from the knowledge graph. We train and test the model on a new dataset, WikiFacts. In experiments, we show that the NKLM significantly improves the perplexity while generating a much smaller number of unknown words.

https://arxiv.org/pdf/1612.00222v1.pdf Interaction Networks for Learning about Objects, Relations and Physics

Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. **Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks.** We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations.** Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains.**

https://arxiv.org/pdf/1701.01358.pdf NeuroRule: A Connectionist Approach to Data Mining

Classification, which involves finding rules that partition a given data set into disjoint groups, is one class of data mining problems. Approaches proposed so far for mining classifi- cation rules for large databases are mainly decision tree based symbolic learning methods. The connectionist approach based on neural networks has been thought not well suited for data mining. One of the major reasons cited is that knowledge generated by neural networks is not explicitly represented in the form of rules suitable for verification or interpretation by humans. This paper examines this issue. With our newly developed algorithms, rules which are similar to, or more concise than those generated by the symbolic methods can be extracted from the neural networks. The data mining process using neural networks with the emphasis on rule extraction is described. Experimental results and comparison with previously published works are presented.

https://arxiv.org/pdf/1611.01628.pdf Reference-Aware Language Models

We propose a general class of language models that treat reference as explicit stochastic latent variables. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three representative applications show our model variants outperform models based on deterministic attention.

https://arxiv.org/abs/1702.05068v1 Discovering objects and their relations from entangled scene representations

In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning.

https://arxiv.org/abs/1702.08367 Differentiable Learning of Logical Rules for Knowledge Base Completion

we propose an alternative approach: a completely differentiable model for learning sets of first-order rules. The approach is inspired by a recently-developed differentiable logic, i.e. a subset of first-order logic for which inference tasks can be compiled into sequences of differentiable operations. Here we describe a neural controller system which learns how to sequentially compose the these primitive differentiable operations to solve reasoning tasks, and in particular, to perform knowledge base completion. The long-term goal of this work is to develop integrated, end-to-end systems that can learn to perform high-level logical reasoning as well as lower-level perceptual tasks.

https://arxiv.org/abs/1703.08098v1 An overview of embedding models of entities and relationships for knowledge base completion

https://arxiv.org/pdf/1609.00777v3.pdf Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

https://arxiv.org/abs/1611.04642 Modeling Large-Scale Structured Relationships with Shared Memory for Knowledge Base Completion

https://arxiv.org/pdf/1611.04642.pdf Modeling Large-Scale Structured Relationships with Shared Memory for Knowledge Base Completion

Recent studies on knowledge base completion, the task of recovering missing relationships based on recorded relations, demonstrate the importance of learning embeddings from multi-step relations. However, due to the size of knowledge bases, learning multi-step relations directly on top of observed triplets could be costly. Hence, a manually designed procedure is often used when training the models. In this paper, we propose Implicit ReasoNets (IRNs), which is designed to perform multi-step inference implicitly through a controller and shared memory. Without a human-designed inference procedure, IRNs use training data to learn to perform multi-step inference in an embedding neural space through the shared memory and controller.

https://arxiv.org/pdf/1704.05908v2.pdf An Interpretable Knowledge Transfer Model for Knowledge Base Completion

We propose a novel embedding model, ITransF, to perform knowledge base completion. Equipped with a sparse attention mechanism, ITransF discovers hidden concepts of relations and transfer statistical strength through the sharing of concepts. Moreover, the learned associations between relations and concepts, which are represented by sparse attention vectors, can be interpreted easily.

https://arxiv.org/abs/1705.03645v1 A Survey of Deep Learning Methods for Relation Extraction

Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.

https://arxiv.org/pdf/1705.10342v1.pdf Deep Learning for Ontology Reasoning

To this end, we introduce a new model for statistical relational learning that is built upon deep recursive neural networks, and give experimental evidence that it can easily compete with, or even outperform, existing logic-based reasoners on the task of ontology reasoning. More precisely, we compared our implemented system with one of the best logic-based ontology reasoners at present, RDFox, on a number of large standard benchmark datasets, and found that our system attained high reasoning quality, while being up to two orders of magnitude faster.

https://arxiv.org/pdf/1705.11040v1.pdf End-to-end Differentiable Proving

These neural networks are constructed recursively by taking inspiration from the backward chaining algorithm as used in Prolog. Specifically, we replace symbolic unification with a differentiable computation on vector representations of symbols using a radial basis function kernel, thereby combining symbolic reasoning with learning subsymbolic vector representations. By using gradient descent, the resulting neural network can be trained to infer facts from a given incomplete knowledge base. It learns to (i) place representations of similar symbols in close proximity in a vector space, (ii) make use of such similarities to prove facts, (iii) induce logical rules, and (iv) use provided and induced logical rules for complex multi-hop reasoning. We demonstrate that this architecture outperforms ComplEx, a state-of-the-art neural link prediction model, on four benchmark knowledge bases while at the same time inducing interpretable function-free first-order logic rules.

https://arxiv.org/pdf/1706.01427v1.pdf A simple neural network module for relational reasoning

Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.

https://arxiv.org/pdf/1706.06383.pdf Programmable Agents

We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.

https://arxiv.org/abs/1706.07179v1 RelNet: End-to-end Modeling of Entities & Relations

We introduce RelNet: a new model for relational reasoning. RelNet is a memory augmented neural network which models entities as abstract memory slots and is equipped with an additional relational memory which models relations between all memory pairs. The model thus builds an abstract knowledge graph on the entities and relations present in a document which can then be used to answer questions about the document. It is trained end-to-end: only supervision to the model is in the form of correct answers to the questions. We test the model on the 20 bAbI question-answering tasks with 10k examples per task and find that it solves all the tasks with a mean error of 0.3%, achieving 0% error on 11 of the 20 tasks.

https://arxiv.org/abs/1706.08186v1 Automatic Synonym Discovery with Knowledge Bases

We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.

https://arxiv.org/abs/1705.02426v2 Analogical Inference for Multi-Relational Embeddings

This paper proposes a novel framework for optimizing the latent representations with respect to the analogical properties of the embedded entities and relations. https://github.com/codeaudit/ANALOGY

https://arxiv.org/abs/1702.08367 Differentiable Learning of Logical Rules for Knowledge Base Reasoning We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method obtains state-of-the-art results on multiple knowledge base benchmark datasets, including Freebase and WikiMovies. https://github.com/TeamCohen/ProPPR

https://arxiv.org/abs/1707.03377 Learning like humans with Deep Symbolic Networks

First, it is universal, using the same structure to store any knowledge. Second, it can learn symbols from the world and construct the deep symbolic networks automatically, by utilizing the fact that real world objects have been naturally separated by singularities. Third, it is symbolic, with the capacity of performing causal deduction and generalization. Fourth, the symbols and the links between them are transparent to us, and thus we will know what it has learned or not - which is the key for the security of an AI system. Fifth, its transparency enables it to learn with relatively small data. Sixth, its knowledge can be accumulated. Last but not least, it is more friendly to unsupervised learning than DNN.

https://arxiv.org/abs/1707.06690v1 DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

https://arxiv.org/pdf/1707.08139.pdf Analogs of Linguistic Structure in Deep Representations

By comparing truth-conditional representations of encoder-produced message vectors to human-produced referring expressions, we are able to identify aligned (vector, utterance) pairs with the same meaning. We then search for structured relationships among these aligned pairs to discover simple vector space transformations corresponding to negation, conjunction, and disjunction. Our results suggest that neural representations are capable of spontaneously developing a “syntax” with functional analogues to qualitative properties of natural language.

https://arxiv.org/pdf/1708.03310v1.pdf Thinking Fast, Thinking Slow! Combining Knowledge Graphs and Vector Spaces.

https://arxiv.org/abs/1709.03980v1 Refining Source Representations with Relation Networks for Neural Machine Translation

https://arxiv.org/pdf/1710.10881v1.pdf Fast Linear Model for Knowledge Graph Embeddings

https://arxiv.org/abs/1705.08439 Thinking Fast and Slow with Deep Learning and Tree Search

In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex, the previous state-of-the-art Hex player.

https://arxiv.org/pdf/1708.08557.pdf A parameterized activation function for learning fuzzy logic operations in deep neural networks

https://arxiv.org/abs/1711.04574 Learning Explanatory Rules from Noisy Data

https://arxiv.org/pdf/1711.06025.pdf Learning to Compare: Relation Network for Few-Shot Learning

https://rasmusbergpalm.github.io/recurrent-relational-networks/ To show that the RRN can solve problems requiring very complex relational reasoning we use it for solving Sudoku puzzles.

https://arxiv.org/pdf/1711.09576.pdf Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples

https://arxiv.org/abs/1712.09687 Combining Representation Learning with Logic for Language Processing

we proposed a way to calculate the gradient of propositional logic rules with respect to parameters of a neural link prediction model (Chapter 3). By stochastically grounding first-order logic rules, we were able to use these rules as regularizers in a matrix factorization neural link prediction model for automated Knowledge Base (KB) completion. This allowed us to embed background knowledge in form of logical rules in the vector space of predicate and entity pair representations. Using this method, we were able to train relation extractors for predicates with provided rules but little or no known training facts.

https://arxiv.org/abs/1712.09687 Combining Representation Learning with Logic for Language Processing

This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.

http://www.jair.org/media/5714/live-5714-10391-jair.pdf Learning Explanatory Rules from Noisy Data

https://arxiv.org/pdf/1802.00050.pdf Recursive Feature Generation for Knowledge-based Learning

With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem.

https://arxiv.org/abs/1802.01021v1 DeepType: Multilingual Entity Linking by Neural Type System Evolution

DeepType overcomes this challenge by explicitly integrating symbolic information into the reasoning process of a neural network with a type system. First we construct a type system, and second, we use it to constrain the outputs of a neural network to respect the symbolic structure. We achieve this by reformulating the design problem into a mixed integer problem: create a type system and subsequently train a neural network with it. In this reformulation discrete variables select which parent-child relations from an ontology are types within the type system, while continuous variables control a classifier fit to the type system. The original problem cannot be solved exactly, so we propose a 2-step algorithm: 1) heuristic search or stochastic optimization over discrete variables that define a type system informed by an Oracle and a Learnability heuristic, 2) gradient descent to fit classifier parameters. We apply DeepType to the problem of Entity Linking on three standard datasets (i.e. WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and find that it outperforms all existing solutions by a wide margin, including approaches that rely on a human-designed type system or recent deep learning-based entity embeddings, while explicitly using symbolic information lets it integrate new entities without retraining.

https://arxiv.org/pdf/1711.11575.pdf Relation Networks for Object Detection

https://arxiv.org/pdf/1802.10353v1.pdf Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.

https://arxiv.org/abs/1803.08035 Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

https://arxiv.org/abs/1803.11189v1 Iterative Visual Reasoning Beyond Convolutions

The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module. Our graph module has three components: a) a knowledge graph where we represent classes as nodes and build edges to encode different types of semantic relationships between them; b) a region graph of the current image where regions in the image are nodes and spatial relationships between these regions are edges; c) an assignment graph that assigns regions to classes. Both the local module and the global module roll-out iteratively and cross-feed predictions to each other to refine estimates. The final predictions are made by combining the best of both modules with an attention mechanism. We show strong performance over plain ConvNets, \eg achieving an 8.4% absolute improvement on ADE measured by per-class average precision. Analysis also shows that the framework is resilient to missing regions for reasoning.

https://arxiv.org/abs/1709.07871v2 FiLM: Visual Reasoning with a General Conditioning Layer

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

https://arxiv.org/abs/1803.03067 Compositional Attention Networks for Machine Reasoning

We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel recurrent Memory, Attention, and Composition (MAC) cell that maintains a separation between control and memory. By stringing the cells together and imposing structural constraints that regulate their interaction, MAC effectively learns to perform iterative reasoning processes that are directly inferred from the data in an end-to-end approach. We demonstrate the model's strength, robustness and interpretability on the challenging CLEVR dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the model is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.

https://github.com/stanfordnlp/mac-network

https://arxiv.org/abs/1805.09354v1 Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module

https://arxiv.org/abs/1806.01261 Relational inductive biases, deep learning, and graph networks

We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. We present a new building block for the AI toolkit with a strong relational inductive bias–the graph network–which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning.

https://arxiv.org/abs/1807.03877v1 Deep Structured Generative Models

In particular, the layout or structure of the scene is encoded by a stochastic and-or graph (sAOG), in which the terminal nodes represent single objects and edges represent relations between objects.

https://arxiv.org/abs/1807.08204 Towards Neural Theorem Proving at Scale

We focus on the Neural Theorem Prover (NTP) model proposed by Rockt{\“{a}}schel and Riedel (2017), a continuous relaxation of the Prolog backward chaining algorithm where unification between terms is replaced by the similarity between their embedding representations. For answering a given query, this model needs to consider all possible proof paths, and then aggregate results - this quickly becomes infeasible even for small Knowledge Bases (KBs). We observe that we can accurately approximate the inference process in this model by considering only proof paths associated with the highest proof scores.

https://arxiv.org/abs/1807.08058v1 Learning Heuristics for Automated Reasoning through Deep Reinforcement Learning

We demonstrate how to learn efficient heuristics for automated reasoning algorithms through deep reinforcement learning. We consider search algorithms for quantified Boolean logics, that already can solve formulas of impressive size - up to 100s of thousands of variables. The main challenge is to find a representation which lends to making predictions in a scalable way. The heuristics learned through our approach significantly improve over the handwritten heuristics for several sets of formulas.

https://arxiv.org/abs/1808.02822v1 Backprop Evolution

https://arxiv.org/abs/1808.06068 SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors

https://arxiv.org/abs/1808.07980 Ontology Reasoning with Deep Neural Networks