Geometry of Compositionality

This paper proposes a simple test for compositionality (i.e., literal usage) of a word or phrase in a context-specific way. The test is computationally simple, relying on no external resources and only uses a set of trained word vectors. Experiments show that the proposed method is competitive with state of the art and displays high accuracy in context-specific compositionality detection of a variety of natural language phenomena (idiomaticity, sarcasm, metaphor) for different datasets in multiple languages. The key insight is to connect compositionality to a curious geometric property of word embeddings, which is of independent interest. Bi-Directional Attention Flow for Machine Comprehension

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Question Answering Corpus A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data Language Modeling with Gated Convolutional Networks

The pre-dominant approach to language modeling to date is based on recurrent neural networks. In this paper we present a convolutional approach to language modeling. We introduce a novel gating mechanism that eases gradient propagation and which performs better than the LSTM-style gating of (Oord et al, 2016) despite being simpler. We achieve a new state of the art on WikiText-103 as well as a new best single-GPU result on the Google Billion Word benchmark. In settings where latency is important, our model achieves an order of magnitude speed-up compared to a recurrent baseline since computation can be parallelized over time. To our knowledge, this is the first time a non-recurrent approach outperforms strong recurrent models on these tasks. ReasoNet: Learning to Stop Reading in Machine Comprehension

In this paper, we propose a ReasoNet that dynamically decides whether to continue or to terminate the inference process in machine comprehension tasks. Using reinforcement learning with the proposed contractive reward, our proposed model has shown to achieve the start-of-the-art results in machine comprehension datasets, including unstructured CNN and Daily Mail datasets, and a proposed structured Graph Reachability dataset Implicit ReasoNet: Modeling Large-Scale Structured Relationships with Shared Memory Multi-Perspective Context Matching for Machine Comprehension Linguistic Knowledge as Memory for Recurrent Neural Networks Attention-based encoder-decoder model for neural machine translation

The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments. Improving Document Clustering by Eliminating Unnatural Language

Technical documents contain a fair amount of unnatural language, such as tables, formulas, pseudo-codes, etc. Unnatural language can be an important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of unnatural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various formats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that removing unnatural language components gives an absolute improvement in document clustering up to 15%. Our corpus and tool are publicly available. A Comparative Study of Word Embeddings for Reading Comprehension

The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures. Here we show that seemingly minor choices made on (1) the use of pre-trained word embeddings, and (2) the representation of outof-vocabulary tokens at test time, can turn out to have a larger impact than architectural choices on the final performance. We systematically explore several options for these choices, and provide recommendations to researchers working in this area. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference dataset can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our sentence encoder is publicly available.

Embodied Construction Grammar is a construction grammar formalism that is designed specifically for integration with an embodied model of language comprehension. Where ECG differs from other construction grammars is in the fact that it is concerned with relating constructions to biologically plausible representations of a listener’s own experiences and understanding of the world. This knowledge, which can relate to understandings as fundamental as movement through physical space through more abstract but still extra-linguistic concepts such as describing a prototypical commercial transaction, is encoded in schemas which describe the roles involved in each schema and their relationships in fulfilling the schematic experience a perceiver may construe. These roles and relationships provide a rich source of inference for language comprehension. Recent Trends in Deep Learning Based Natural Language Processing

Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding. A Study on Neural Network Language Modeling Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection Neural Response Generation with Dynamic Vocabularies

We consider dynamically allocating a vocabulary to an input in the decoding stage for response generation in open domain conversation. To this end, we propose a dynamic vocabulary sequence-to-sequence model, and derive a learning approach that can jointly optimize vocabulary construction and response generation through a Monte Carlo sampling method. Advances in Pre-Training Distributed Word Representations

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks. Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models Dynamic Integration of Background Knowledge in Neural NLU Systems

Common-sense or background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, the requisite background knowledge is indirectly acquired from static corpora. We develop a new reading architecture for the dynamic integration of explicit background knowledge in NLU models. A new task-agnostic reading module provides refined word representations to a task-specific NLU architecture by processing background knowledge in the form of free-text statements, together with the task-specific inputs. Strong performance on the tasks of document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of our approach. Analysis shows that our models learn to exploit knowledge selectively and in a semantically appropriate way. NLP's ImageNet moment has arrived Fake Sentence Detection as a Training Task for Sentence Encoding ReasoNet: Learning to Stop Reading in Machine Comprehension R-NET: MACHINE READING COMPREHENSION WITH SELF-MATCHING NETWORKS Neural Machine Translation of Rare Words with Subword Units Character-Level Language Modeling with Deeper Self-Attention

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks- 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions. Optimally Segmenting Inputs for NMT Shows Preference for Character-Level Processing

In an evaluation on three translation tasks we found that, given the freedom to navigate between different segmentation levels, the model prefers to operate on (almost) character level, providing support for purely character-level NMT models from a novel angle. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Human-like Natural Language Generation Using Monte Carlo Tree Search On the State of the Art of Evaluation in Neural Language Models pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference FRAGE: Frequency-Agnostic Word Representation