https://arxiv.org/abs/1409.0473v7 Neural Machine Translation by Jointly Learning to Align and Translate

Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance.

https://arxiv.org/pdf/1610.10099v1.pdf Neural Machine Translation in Linear Time

We present a neural translation model, the ByteNet, and a neural language model, the ByteNet Decoder, that aim at addressing these drawbacks. The ByteNet uses convolutional neural networks with dilation for both the source network and the target network. The ByteNet connects the source and target networks via stacking and unfolds the target network dynamically to generate variable length output sequences. We view the ByteNet as an instance of a wider family of sequence-mapping architectures that stack the sub-networks and use dynamic unfolding. The sub-networks themselves may be convolutional or recurrent.

https://phillipi.github.io/pix2pix/ Image-to-Image Translation with Conditional Adversarial Nets

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

phillipi.github.io_pix2pix_images_teaser_v3.jpg

https://arxiv.org/pdf/1611.05358v1.pdf Lip Reading Sentences in the Wild

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a ‘Watch, Listen, Attend and Spell’ (WLAS) network that learns to transcribe videos of mouth motion to characters; (2) a curriculum learning strategy to accelerate training and to reduce overfitting; (3) a ‘Lip Reading Sentences’ (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television. The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available.

https://arxiv.org/abs/1703.01619v1 Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

This tutorial introduces a new and powerful set of techniques variously called “neural machine translation” or “neural sequence-to-sequence models”. These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice.

https://arxiv.org/pdf/1704.06933v1.pdf Adversarial Neural Machine Translation

In this paper, we study a new learning paradigm for Neural Machine Translation (NMT). Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the translation given by a NMT model. To achieve this goal, inspired by the recent success of generative adversarial networks (GANs), we employ an adversarial training architecture and name it as AdversarialNMT. In Adversarial-NMT, the training of the NMT model is assisted by an adversary, which is an elaborately designed Convolutional Neural Network (CNN). The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human. The goal of the NMT model is to produce high quality translations so as to cheat the adversary. A policy gradient method is leveraged to co-train the NMT model and the adversary. Experimental results on English→French and German→English translation tasks show that Adversarial-NMT can achieve significantly better translation quality than several strong baselines.

https://arxiv.org/abs/1704.06933 Adversarial Neural Machine Translation

Instead of maximizing the likelihood of the human translation as in previous works, we minimize the distinction between human translation and the translation given by a NMT model.

https://arxiv.org/pdf/1707.07631v1.pdf Deep Architectures for Neural Machine Translation

https://arxiv.org/abs/1808.09943 Revisiting Character-Based Neural Machine Translation with Capacity and Compression

https://arxiv.org/pdf/1804.07755.pdf Phrase-Based & Neural Unsupervised Machine Translation https://www.forbes.com/sites/williamfalcon/2018/09/01/facebook-just-set-a-new-record-in-translation-and-why-it-matters/#5006e0ab2bc2 https://github.com/facebookresearch/UnsupervisedMT https://code.fb.com/ai-research/unsupervised-machine-translation-a-novel-approach-to-provide-fast-accurate-translations-for-more-languages/