# Multi-Label Classification

**Aliases** Multi-Category Classification

**Intent**

Classify input into several classes.

**Motivation**

How can we build classifiers that make predictions that fit within several classes?

**Sketch**

*This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern.
*

**Discussion**

*This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We donâ€™t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.*

**Known Uses**

*Here we review several projects or papers that have used this pattern.*

**Related Patterns**
*
In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.*

*Relationship to Canonical Patterns*

*Relationship to other Patterns*

**Further Reading**

*We provide here some additional external material that will help in exploring this pattern in more detail.*

**References**

*To aid in reading, we include sources that are referenced in the text in the pattern.*

http://jmlr.org/proceedings/papers/v48/cisse16.pdf ADIOS: Architectures Deep In Output Space

Multi-label classification is a generalization of binary classification where the task consists in predicting sets of labels.

http://arxiv.org/abs/1607.05691v1

Information-theoretical label embeddings for large-scale image classification

We present a method for training multi-label, massively multi-class image classification models, that is faster and more accurate than supervision via a sigmoid cross-entropy loss (logistic regression). Our method consists in embedding high-dimensional sparse labels onto a lower-dimensional dense sphere of unit-normed vectors, and treating the classification problem as a cosine proximity regression problem on this sphere.

http://arxiv.org/abs/1502.02710 Scalable Multilabel Prediction via Randomized Methods

we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems. Crucially, we compute the joint predictions without ever obtaining any independent predictions, while incorporating low-rank and smoothness regularization. We achieve this by leveraging randomized algorithms for matrix decomposition and kernel approximation.

http://arxiv.org/pdf/1607.05709v1.pdf Multi-category Angle-based Classifier Refit

However, for the simultaneous multicategory classiffication framework, much less work has been done. We fill the gap in this paper. In particular, we give theoretical insights on why heavy regularization terms are often needed in high dimensional applications, and how this can lead to bias in probability estimation. To overcome this difficulty, we propose a new ret strategy for multicategory angle-based classiffiers. Our new method only adds a small computation cost to the problem, and is able to attain prediction accuracy that is as good as the regular margin-based classiffiers.

http://arxiv.org/abs/1506.05439v3 Learning with a Wasserstein Loss

The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss.

https://arxiv.org/abs/1604.04573 CNN-RNN: A Unified Framework for Multi-label Image Classification

Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.

An illustration of the CNN-RNN framework for multilabel image classification. The framework learns a joint embedding space to characterize the image-label relationship as well as label dependency. The red and blue dots are the label and image embeddings, respectively, and the black dots are the sum of the image and recurrent neuron output embeddings. The recurrent neurons model the label co-occurrence dependencies in the joint embedding space by sequentially linking the label embeddings in the joint embedding space. At each time step, the probability of a label is computed based on the image embedding and the output of the recurrent neurons.