This is an old revision of the document!

Data Augmentation



Artificially modifying the training data so as to create more examples.


How can we augment the training examples so that the network learns to predict more accurately?


This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern. Discussion

This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.

Known Uses

Here we review several projects or papers that have used this pattern.

Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.

Relationship to Canonical Patterns

Relationship to other Patterns

Further Reading

We provide here some additional external material that will help in exploring this pattern in more detail.


To aid in reading, we include sources that are referenced in the text in the pattern.

Since deep networks need to be trained on a huge number of training images to achieve satisfactory performance, if the original image data set contains limited training images, it is better to do data augmentation to boost the performance. Also, data augmentation becomes the thing must to do when training a deep network.

There are many ways to do data augmentation, such as the popular horizontally flipping, random crops and color jittering. Moreover, you could try combinations of multiple different processing, e.g., doing the rotation and random scaling at the same time. In addition, you can try to raise saturation and value (S and V components of the HSV color space) of all pixels to a power between 0.25 and 4 (same for all pixels within a patch), multiply these values by a factor between 0.7 and 1.4, and add to them a value between -0.1 and 0.1. Also, you could add a value between [-0.1, 0.1] to the hue (H component of HSV) of all pixels in the image/patch.

Krizhevsky et al. [1] proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB channels in training images. In practice, you can firstly perform PCA on the set of RGB pixel values throughout your training images. And then, for each training image, just add the following quantity to each RGB image pixel (i.e., Ixy=[IxyR,IxyG,IxyB]T): p1,p2,p3][α1 λ1,α2λ2,α3λ3]T where, pi and λi are the i-th eigenvector and eigenvalue of the 3 × 3 covariance matrix of RGB pixel values, respectively, and αi is a random variable drawn from a Gaussian with mean zero and standard deviation 0.1. Please note that, each αi is drawn only once for all the pixels of a particular training image until that image is used for training again. That is to say, when the model meets the same training image again, it will randomly produce another αi for data augmentation. In [1], they claimed that “fancy PCA could approximately capture an important property of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination”. To the classification performance, this scheme reduced the top-1 error rate by over 1% in the competition of ImageNet 2012.

Simard, Patrice Y, Steinkraus, Dave, and Platt, John C. Best practices for convolutional neural networks applied to visual document analysis. In null, pp. 958. IEEE, 2003.

During training, examples are randomly perturbed with transformations from this class, to encourage the network to produce the correct result regardless of how the input is transformed. Exploiting Cyclic Symmetry in CNNs A data augmentation methodology for training machine/deep learning gait recognition algorithms

We introduce a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation. With this methodology, we generated a multi-modal dataset. Generating Paraphrases from DBPedia using Deep Learning Data Programming NIPS 2016 Spotlight Video Data Programming with DDLite: Putting Humans in a Different Part of the Loop Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning Linguistic Knowledge as Memory for Recurrent Neural Networks

Specifically, external knowledge is used to augment a sequence with typed edges between arbitrarily distant elements, and the resulting graph is decomposed into directed acyclic subgraphs. We introduce a model that encodes such graphs as explicit memory in recurrent neural networks, and use it to model coreference relations in text. We apply our model to several text comprehension tasks and achieve new state-of-the-art results on all considered benchmarks, including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out of the 20 tasks with only 1000 training examples per task. Analysis of the learned representations further demonstrates the ability of our model to encode fine-grained entity information across a document. Data Programming: Creating Large Training Sets, Quickly Improving Deep Learning using Generic Data Augmentation

Various geometric and photometric schemes are evaluated on a coarse-grained data set using a relatively simple CNN. Experimental results, run using 4-fold cross-validation and reported in terms of Top-1 and Top-5 accuracy, indicate that cropping in geometric augmentation significantly increases CNN task performance.

These results indicate the importance of augmenting coarse-grained training data-sets using transformations that alter the geometry of the images rather than just lighting and color. nuts-flow/ml : data pre-processing for deep learning mixup: Beyond Empirical Risk Minimization

In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. DATASET AUGMENTATION IN FEATURE SPACE

We start with existing data points and apply simple transformations such as adding noise, interpolating, or extrapolating between them. Our main insight is to perform the transformation not in input space, but in a learned feature space.