In our previous post, we explored computational cost as one major shortcoming of evolutionary algorithms and reviewed some of the exciting research presented at GECCO 2018 devoted to addressing this issue. The first approach relied on the use of a surrogate model to approximate the quality of candidate solutions in an evolutionary algorithm rather than computing them directly, which can be computationally expensive. The second approach sought to characterize the search landscapes for different types of problems in an effort to extrapolate best practices that can be used to avoid local optima, or suboptimal solutions that cause the algorithm to get “stuck” in practice.
Now, we will review some work presented by our own SparkCognition researchers at GECCO 2018. This work focuses on supervised multiclass classification, a common machine learning task in which a new observation must be assigned to one of three or more known categories, or classes. We discovered that simple techniques can be used in conjunction with evolutionary algorithms to reduce the cost of evaluating individual candidate solutions, while also avoiding poorly performing local optima.
Conceptually, our approach divides a complex problem into a set of simpler problems. The simplified problems can be solved quickly, and their answers can then be aggregated to solve the original problem. The resultant divide and conquer algorithm is one technique employed by Darwin, our automated machine learning platform, to rapidly produce highly optimized models for real-world machine learning tasks.
Divide and Conquer
Specifically, our algorithm transforms a multiclass classification problem into a set of binary classification problems. Whereas multiclass classification problems require a new observation to be assigned to one of three or more classes, binary classification problems only require observations to be assigned to one of two classes. We then apply a traditional evolutionary algorithm, such as NEAT, to produce a classifier for each of the simplified binary classification problems. Finally, the results are aggregated to form a complete solution.
We explored two different methods for transformation and aggregation:
The first method is the one-versus-all approach. For this approach, one binary classifier is created to predict each class in the original problem. That is, the classifier must determine whether a new observation belongs to its specified class or not. Intuitively, this is an easier task, since a successful model only needs to learn the characteristics of that one class, rather than those important for all classes. To classify a new observation, the observation is passed through each binary classifier and the final label is chosen by locating the classifier which most confidently predicted the observation to belong to its class.
One problem with this approach in practice is data imbalance. For each task, a model must learn how to identify observations from one particular class. However, since each model is trained on all of the available data, it inevitably has access many more examples of observations that do not belong to its class on average than ones that do. This can lead to learning problems.
The second approach, one-versus-one, resolves this issue. It instead creates one classifier task to distinguish between each pair of classes. The results from these classifiers can then be aggregated using a simple voting scheme: The class which receives the most “wins” is the predicted class.
Though, the number of observations in each task may not be precisely equal, they will tend to be much more balanced than in the one-versus-all case, resulting in more stable learning. However, this behavior comes at a cost. The number of tasks generated by the one-versus-one transformation is a quadratic function of the number of classes, in comparison to the linear one-versus-all transformation.
The two approaches are summarized in the figure below:
A Perfect Match?
Interestingly, the one-versus-one and one-versus-all transformations are not new inventions. They have been historically used to enable binary classification models, such as the support vector machine, to be applied to multiclass classification problems.
However, in our case, evolutionary algorithms such as NEAT are already capable of addressing multiclass problems. Instead, we employ these techniques based on a few observations:
Evolutionary algorithms thrive on simple problems.
Simpler problems often have simpler solutions.
Simpler solutions are less expensive to evaluate.
Each binary classification task can be solved independently. (Parallelization!)
Results
We applied these two techniques to a variety of popular multiclass classification datasets and compared the results to the original NEAT algorithm. The results are shown in the graph below.
Both the one-versus-one and one-versus-all approaches, when used in combination with NEAT, offer improved performance in nearly every case. In particular, as the number of classes in a classification problem increases, the divide and conquer approaches offer significantly higher accuracy. In the case of the UCI letter dataset, a problem with 26 different classes, the one-versus-one approach provides nearly 4x higher accuracy than vanilla NEAT.
As we will see, the success of our approaches recalls themes from our ongoing discussion. They quietly reduce the computational cost while also crafting simpler search spaces which allow evolutionary algorithms to operate without encountering poor local optima.
Cost of Evaluating Candidate Solutions
To explore how these methods improve the efficiency of the evolutionary algorithm, we documented the total number of generations that were completed by each method. Generations are one measure of the progression of an evolutionary algorithm. One generation corresponds to the creation and evaluation of one set of candidate solutions. We also record the number of neurons and connections present in the final neural network produced by each algorithm in order to measure the complexity of evolved solutions in either case.
The results are shown below:
Recall that one-versus-all and one-versus-one divide a problem into subproblems, each of which is solved by an evolutionary algorithm. As a result, we report two numbers for each of these methods above. The first is the total summed across all constituent subproblems. The second (parenthesized) is the average across the subproblems.
Notice that each individual solution produced by our techniques (parenthesized) contains far fewer nodes and connections than the baseline evolutionary algorithm. This subscribes to our original intuition that the transformed tasks are simpler than the originals. Simpler problems have simpler solutions, and simple solutions take less time to evaluate than complex ones. Consequently, we see an increase in overall evolutionary progress, as measured in generations, in the same amount of wall-clock time.
Cost of Encountering Local Optima
In addition to reducing the cost of any single solution evaluation, evolutionary algorithms are also less susceptible to local optima when solving the simplified tasks.
The following visualization compares the performance of the original NEAT (top) and one of our divide and conquer techniques (bottom). In this illustration, each row corresponds to observations from a single class (there are ten classes total). Each cell corresponds to a single observation within that class. Purple cells (darker) denote samples that were correctly classified by the algorithm, while pink cells (lighter) signify misclassifications.
Here, we observe that the NEAT algorithm can correctly classify most samples from a subset of six classes but cannot discriminate between the remaining four. In fact, we found that even if we give NEAT unlimited time, it never improves upon this behavior. In other words, it is trapped in a local optimum, and any additional runtime can be recognized as wasted computation. Across all of our experiments, we found that NEAT in particular struggles to encode information about more than five or six classes at a time in practice. In contrast, our divide and conquer evolutionary algorithm only requires the evolutionary algorithm to distinguish between two classes a time. As a result, the final model can identify samples from all classes with ease.
Where does that leave us?
In this post, we presented a simple divide and conquer technique which can be utilized to significantly improve the performance of evolutionary algorithms such as NEAT on common classification problems. The technique decomposes complex tasks into simpler ones which can be solved more efficiently while avoiding common pitfalls such as local optima. Additionally, the simplified tasks can be solved in parallel, which allows us to scale up to larger problems.
We presented our research at GECCO and see this work as part of an ongoing effort at SparkCognition and in the wider research community to harness the power of evolutionary computation at scale to solve complex problems. We encourage you to read our full paper. Darwin leverages evolutionary computation and deep learning to automatically create highly optimized machine learning models. It is our vision for the future, a tool which can empower both data scientists and data nonscientists to solve real-world problems.
We look forward to sharing more with you soon!
References
McDonnell, Tyler, et al. “Divide and conquer: neuroevolution for multiclass classification.” Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 2018.