The practice of software development has created development methodologies such agile development and lean methodology to tackle the complexity of development with the objective of improving the quality and efficiency of software creation. Although Deep Learning is built from software it is a different kind of software and therefore a different kind of methodology is needed. Deep Learning differs most from traditional software development in that a substantial portion of the process involves the machine learning how to achieve objectives. The developer is not completely out of the equation, but is working in concert to tweak the Deep Learning algorithm.

Deep Learning is sufficiently rich and complex a subject that a process model or methodology is required to guide a developer. The methodology addresses the necessary interplay of the need for more training data and the exploration of alternative Deep Learning patterns that drive the discovery of an effective architecture. The methodology depicted as follows:

We begin first we some initial definition of the kind of architecture we wish to train. This will of course be driven by the nature of the data that we a training from and the kind of prediction we seek. The latter is guided by Explanatory and the former by Feature. There are a variety ways to optimize our training process, this is guided by the Learning.

After the selection of our network model and the data we plan on training on, the developer is then tasked with answering the question as to whether adequate labeled training set is available. This process goes beyond conventional machine learning process that divides the dataset into three sets. The machine learning convention has been to create a training set, a validation set and a test set. In the first step of the process, if the training remains high there are several options that can be pursued. The first is to try to increase the size of the model, a second option is perhaps train a bit long (alternatively perform hyper-parameter tuning) and if all fails then the developer tweaks the architecture or attempts a new architecture. In the second step of the process, a develop validates the training against a validation set, if the error rate is high indicating overfitting then the options are to find more data, apply different regularizations and if all fails attempt another architecture. The observations here that differs from conventional machine learning is that Deep Learning has more flexibility in that a developer has the additional options of employing either a bigger model or using more data. One of the hallmarks of deep learning is its scalability in performing well when trained with large data sets.

Trying a larger model is something that a developer has control over, unfortunately finding more data poses a more difficult problem. To satisfy this need for more data one can leverage data from different contexts. In addition one can employing data synthesis and data augmentation to increase the size of training data. These approaches however lead to domain adaptation issues, so a slight change in the traditional machine learning development model is called for. In this extended approach, the validation and training sets are required to belong to the same context. Furthermore, to validate training from this heterogeneous set, another set called the training-validation set is set aside to act as additional validation. This basic process model, inspired by a talk by Andrew Ng, serves as a good scaffolding to hang off the many different patterns that we find in Deep Learning.

As you can see, there are many paths of exploration and many alternatives models that may be explored to reach to a solution. Furthermore, their is sufficient Modularity in Deep Learning that we may compose solutions from other previously developed solutions. Autoencoders, Neural Embedding, Transfer Learning and bootstrapping with pre-trained networks are some of the tools that do provide potential shortcuts that reduce the need to train from scratch.

Following are a few recommendations that drive value in this methodology:

The More Experts the Better

The one tried and true way to improve accuracy is to have more networks perform the inferencing and combining the results. In fact, techniques like DropOut is a means of creating “Implicit Ensembles” were multiple subsets of superimposed networks cooperate using shared weights.

Seek Problems where Labeled Data is Abundant

The current state of Deep Learning is that it works well only in a supervised context. The rule of thumb is around 1,000 samples per rule. So if you are given a problem where you don’t have enough data to train with, try considering an intermediate problem that does have more data and then run a simpler algorithm with the results from the intermediate problem.

Search for ways to Synthesize Data

Not all data is nicely curated and labeled for machine learning. Many times you have data that are weakly tagged. If you can join data from disparate sources to achieve a weakly labeled set, then this approach works surprisingly well. The most well known example is Word2Vec where you train for word understanding based on the words that happen to be in proximity with other words.

Leverage Pre-trained Networks

One of the spectacular capabilities of Deep Learning networks is that bootstrapping from an existing pre-trained network and using it to train into a new domain works surprisingly well.

Don’t forget to Augment Data

Data usually have meaning that a human may be aware of that a machine can likely never discover. One simple example is a time feature. From the perspective of a human the day of the week or the time of the day may be important attributes, however a Deep Learning system may never be able to surface that if all its given are seconds since Unix epoch.

Explore Different Regularizations

L1 and L2 regularizations are not the only regularizations that are out there. Explore the different kinds and perhaps look at different regularizations per layer.

End-to-End Deep Learning is a Hail Mary Play

A lot of researchers love to explore end-to-end deep learning research. Unfortunately, the most effective use of Deep Learning has been to couple it with out techniques. AlphaGo would not have been successful if Monte Carlo Tree Search was not employed.

References Jupyter notebook best practices Why is machine learning 'hard'? On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation