Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
unsupervised_learning [2018/01/29 22:56]
admin
unsupervised_learning [2018/10/26 10:38]
admin
Line 223: Line 223:
  
 To this end, we propose “fidelity-weighted learning” (FWL), a semi-supervised student- teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations. To this end, we propose “fidelity-weighted learning” (FWL), a semi-supervised student- teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.
 +
 +https://​hazyresearch.github.io/​snorkel/​blog/​ws_blog_post.html
 +
 +https://​arxiv.org/​abs/​1704.08803 Neural Ranking Models with Weak Supervision
 +
 +Hence, in this paper, we propose to train a neural ranking model using weak supervision,​ where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal.
 +
 +http://​metalearning.ml/​papers/​metalearn17_dehghani.pdf Learning to Learn from Weak Supervision
 +by Full Supervision
 +
 +In this paper, we propose a method for training neural networks when we have a
 +large set of data with weak labels and a small amount of data with true labels. In
 +our proposed model, we train two neural networks: a target network, the learner and
 +a confidence network, the meta-learner. The target network is optimized to perform a
 +given task and is trained using a large set of unlabeled data that are weakly annotated.
 +We propose to control the magnitude of the gradient updates to the target network
 +using the scores provided by the second confidence network, which is trained on
 +a small amount of supervised data. Thus we avoid that the weight updates computed
 +from noisy labels harm the quality of the target network model.
 +
 +https://​openreview.net/​pdf?​id=ByoT9Fkvz LEARNING TO LEARN WITHOUT LABELS
 +
 +By recasting unsupervised learning as meta-learning,​ we treat the creation of the unsupervised update
 +rule as a transfer learning problem. Instead of learning transferable features, such as done in (Vinyals
 +et al., 2016; Ravi & Larochelle, 2016; Snell et al., 2017), we learn a transferable learning rule
 +which does not require access to labels and generalizes across domains. Although we focus on the
 +meta-objective of semi-supervised classification here, in principle a learning rule could be optimized
 +to generate representations for any subsequent task.
 +
 +https://​papers.nips.cc/​paper/​7278-learning-to-model-the-tail Learning to Model the Tail
 + 
 +We describe an approach to learning from long-tailed,​ imbalanced datasets that are prevalent in real-world settings.
 +
 +https://​arxiv.org/​abs/​1804.00092 Iterative Learning with Open-set Noisy Labels
 +
 +Large-scale datasets possessing clean label annotations are crucial for training Convolutional Neural Networks (CNNs). However, labeling large-scale data can be very costly and error-prone,​ and even high-quality datasets are likely to contain noisy (incorrect) labels. Existing works usually employ a closed-set assumption, whereby the samples associated with noisy labels possess a true class contained within the set of known classes in the training data. However, such an assumption is too restrictive for many applications,​ since samples associated with noisy labels might in fact possess a true class that is not present in the training data. We refer to this more complex scenario as the \textbf{open-set noisy label} problem and show that it is nontrivial in order to make accurate predictions. To address this problem, we propose a novel iterative learning framework for training CNNs on datasets with open-set noisy labels. Our approach detects noisy labels and learns deep discriminative features in an iterative fashion. To benefit from the noisy label detection, we design a Siamese network to encourage clean labels and noisy labels to be dissimilar. A reweighting module is also applied to simultaneously emphasize the learning from clean labels and reduce the effect caused by noisy labels. Experiments on CIFAR-10, ImageNet and real-world noisy (web-search) datasets demonstrate that our proposed model can robustly train CNNs in the presence of a high proportion of open-set as well as closed-set noisy labels.
 +
 +https://​arxiv.org/​abs/​1804.03273v1 On the Supermodularity of Active Graph-based Semi-supervised Learning with Stieltjes Matrix Regularization
 +
 +https://​papers.nips.cc/​paper/​6469-dual-learning-for-machine-translation.pdf Dual Learning for Machine Translation
 +
 + This mechanism is
 +inspired by the following observation:​ any machine translation task has a dual task,
 +e.g., English-to-French translation (primal) versus French-to-English translation
 +(dual); the primal and dual tasks can form a closed loop, and generate informative
 +feedback signals to train the translation models, even if without the involvement of
 +a human labeler. In the dual-learning mechanism, we use one agent to represent the
 +model for the primal task and the other agent to represent the model for the dual
 +task, then ask them to teach each other through a reinforcement learning process.
 +Based on the feedback signals generated during this process (e.g., the languagemodel
 +likelihood of the output of a model, and the reconstruction error of the
 +original sentence after the primal and dual translations),​ we can iteratively update
 +the two models until convergence (e.g., using the policy gradient methods). We call
 +the corresponding approach to neural machine translation dual-NMT.
 +
 +https://​arxiv.org/​abs/​1606.04596 Semi-Supervised Learning for Neural Machine Translation
 +
 +While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder,​ in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.
 +
 +https://​arxiv.org/​abs/​1804.09170v1 Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
 +
 +we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported,​ that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples. To help guide SSL research towards real-world applicability,​ we make our unified reimplemention and evaluation platform publicly available.
 +
 +https://​arxiv.org/​abs/​1808.08485v1 Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision
 +
 +. In this paper, we propose deep probabilistic logic (DPL) as a general framework for indirect supervision,​ by composing probabilistic logic with deep learning. DPL models label decisions as latent variables, represents prior knowledge on their relations using weighted first-order logical formulas, and alternates between learning a deep neural network for the end task and refining uncertain formula weights for indirect supervision,​ using variational EM. This framework subsumes prior indirect supervision methods as special cases, and enables novel combination via infusion of rich domain and linguistic knowledge. http://​hanover.azurewebsites.net/​
 +
 +https://​openreview.net/​forum?​id=r1g7y2RqYX Label Propagation Networks
 +
 +https://​arxiv.org/​abs/​1810.02840 Training Complex Models with Multi-Task Weak Supervision
 +
 + We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically,​ we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.
 +
 +https://​colinraffel.com/​publications/​nips2018realistic.pdf Realistic Evaluation of Deep Semi-Supervised
 +Learning Algorithms . https://​github.com/​brain-research/​realistic-ssl-evaluation