This is the ninth article in a series dedicated to the various aspects of machine learning. Today’s article will be continuing the discussion on learning algorithms, with a focus on ensemble learning, which can help agents work around complications brought about by decision tree algorithms, along with the more general use of analyzing scores of hypotheses. There are many forms of ensemble learning, and this piece will provide an overview of some of the most common forms used by ML agents.
Last Machine Learning article ended by telling you some of the biggest drawbacks to using decision tree algorithms. Some trees are too “deep” to run on a consistent basis, due to how expensive it can be (memory is currency for AI agents). Adding just one new example to a tree can also cause major structural damage, beginning at the “root.” Other times, it can lead to choosing less-than-best decisions in the interest of saving time.
We said that some of these issues can be helped, if not resolved, by models like random forests. We’ll get into what a random forest model is in a bit, but we need to back up a bit and get a bigger picture so that we don’t miss the forest for the trees. The concept we need to know to learn about random forests is ensemble learning.
Ensemble Learning
Ensemble learning is significant because it takes multiple—an ensemble—, rather than just one, hypotheses, or models built from data that represent how to solve a problem, and combines their predictions through a machine learning process, such as voting or averaging.
The individual hypotheses are referred to as base models, and the finished combined product is an ensemble product.
So why combine them all? Why not just pick the hypothesis that seems the most likely or realistic? One reason is that it can reduce bias, because some base models do have a heavy bias towards certain decisions. But an ensemble model can express the range of considerations at hand, and offer a more well-rounded picture of what ought to be done.
Another reason is that combining predictions can combat errors. If you are working with a machine learning agent that classifies pictures of owls, then having, say, four hypotheses at work can ensure that the owl is labeled an owl, rather than a penguin, at a (hopefully) higher rate than a single base model will.
So, onto some methods of ensemble creation (one of which is random forests!)
Bagging
This method creates a certain number of data sets for training by taking examples from a preexisting data set. These examples are analyzed multiple times to form different hypotheses, whose predictions are then combined.
Bagging is good for minimizing error and works well with limited data problems or in cases of overfitting (i.e., the ML agent is too used to a training data set and struggles with new, real-world input data).
It is also a useful method for decision trees, as it can create a wide number of decision trees.
Random Forests
Random forest models are a type of decision tree bagging that helps make the multiple trees different enough from each other, so that error can be cut down on. If all the trees were too alike, then if any one made an error the likelihood that the others make the same error is higher.
They are easy to create and help decrease overfitting. Random forests are commonly used in income prediction, credit card default prediction, in addition to mechanical and medical predictions.
Stacking
Bagging draws combined predictions from different data sets, whereas stacking makes predictions from just one training set, though each base model trained on that set comes from a different model class—so, prediction from a decision tree based models may be combined with the predictions of a clustering base model.
The “stacking” being done is a layering of these base models with their combined ensemble model on top of it all.
Boosting
This is the most popular form of ensemble learning. Each example in a training set is weighted with importance, so some will matter more than others when forming a hypothesis.
The first hypothesis is made from an equal weight distribution among the training set. Some misclassifications will appear among the examples, so for the next hypothesis formation, these examples will be given greater weight in the interest of making them correctly classified during the next run-through. The end goal is to minimize the error by continually running these tests.
Machine Learning Summary
There are many forms of ensemble learning, but these are some of the most popular types used by ML agents. The important thing to know is that all of these involve getting an informed, bigger-picture prediction based on a variety of hypotheses, rather than having to trust one single hypothesis.
Recent Comments