Late Binding A Flexible Online Classifier using Supervised Generative Reconstruction During Recognition.

This work proposes a recognition algorithm that uses dynamics in an opposite way from traditional recognition algorithms. Recognition algorithms also know as classifier algorithms function in two phases: a learning phase and a testing phase. Most classifier algorithms utilize feedforward connections during testing. They use dynamic feedforward-feedback signals to learn the feedforward connections during the learning phase. A supervised generative classifier is proposed where the key innovation is that the dynamic feedforward-feedback component is essential during the testing as opposed to the learning phase. This changes the form which learned information is stored. The new form represents the fixed points or solutions of the network. This form allows for more intuitive symbolic-like weighs, more-flexible online learning, and the dynamics involved in the recognition phase emulates cognitive phenomena. Brain-like architecture, capabilities, and limits associated with this model suggest the brain may perform recognition and store information using a similar approach. Evaluating the Contribution of Top-Down Feedback and Post-Learning Reconstruction Attentive Recurrent Comparators

Rapid and continual learning models require that the representation space they use be dynamic and constantly changing as the model encounters new evidence. While recently there have been many end-to-end, meta-learning based approaches, they have significant drawbacks. We present a novel model that crudely approximates having a dynamic representation space at inference time. The entire representation space is defined relative to the test example and the entire context of this relative representation space is considered before the model makes a prediction. We extensively test all aspects of our model across various real world datasets. In the challenging task of one-shot learning on the Omniglot dataset, our model achieves the first superhuman performance for a neural method with an error rate of 1.5%.

At the most basic level, we presented a model that uses attention and recurrence to cycle through a set images repeatedly and estimate their similarity. We showed that this model is not only viable but also much better than the siamese neural networks in wide use today in terms of performance and generalization. But, taking a step back, we showed the value of having dynamic representations and presented a novel way of crudely approximating it. Our main result is in the task of One Shot classification on the Omniglot dataset, where we achieved state of the art performance surpassing HBPL’s and human performance.

Though presented in the context of images, ARCs can be used in any modality. There are innumerable ways to extend ARCs. Better attention mechanisms, higher resolution images, different datasets, hyper-parameter tuning, more complicated controllers etc are the simple things which could be employed to achieve better performance.

More interesting extensions would involve developing more complex architectures using this bottom-up, lazy approach to solve even more challenging AI tasks. Neural Episodic Control

We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents. Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

n this work, we present an intriguing behavior: pre-trained CNNs can be made to improve their predictions by structurally perturbing the input. We observe that these perturbations - referred as Guided Perturbations - enable a trained network to improve its prediction performance without any learning or change in network weights. We perform various ablative experiments to understand how these perturbations affect the local context and feature representations. Furthermore, we demonstrate that this idea can improve performance of several existing approaches on semantic segmentation and scene labeling tasks on the PASCAL VOC dataset and supervised classification tasks on MNIST and CIFAR10 datasets. Learning to Compute Word Embeddings on the Fly

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the “long tail” of this distribution requires enormous amounts of data. Representations of rare words trained directly on end-tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained against the end task. We show that this improves results against baselines where embeddings are trained on the end task in a reading comprehension task, a recognizing textual entailment task, and in language modelling. Federated Learning: Collaborative Machine Learning without Centralized Training Data

Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well. Anytime Neural Networks via Joint Optimization of Auxiliary Losses

An anytime predictor automatically adjusts to and utilizes available test-time budget: it produces a crude initial result quickly and continuously refines the result afterwards. Traditional feed-forward networks achieve state-of-the-art performance on many machine learning tasks, but cannot produce anytime predictions during their typically expensive computation. In this work, we propose to add auxiliary predictions in a residual network to generate anytime predictions, and optimize these predictions simultaneously. We solve this multi-objective optimization by minimizing a carefully constructed weighted sum of losses. We also oscillate weightings of the losses in each iteration to avoid spurious solutions that are optimal for the sum but not for each individual loss. Understanding Attentive Recurrent Comparators