This is the thirty-second article in a series dedicated to the various aspects of machine learning (ML). Today’s article will discuss the decision-making behind choosing one learning algorithm over another, a key aspect of the machine learning process.
There’s more than one way to peel an orange. You can jam your thumb into a random spot on the orange and tear from that spot. You can start at the orange stem, and gently peel from there, from top to bottom (or bottom to top). You can peel in a neat, ribbony fashion that leaves the whole peel intact, or you can claw away so that the orange peel is left in a variety of savaged scraps. The world is your oyster, or, um, orange, when it comes to the peeling of an orange.
No matter your preferred method, the result is always the same: The orange is peeled, and you still need to strip away a lot of the “pith,” which is the white, stringy gunk that surrounds the delicious orange.
But is every result the same, though? Sure, the orange is peeled at the end of the peel sesh, but some peeling methods leave the soft and delicate fruit in a worse condition than others. For example, a jab-and-claw method will likely bruise the fruit, causing unnecessary punctures in the fruit itself that will be less pleasurable to chew than the orange that is left relatively unscathed by a top-to-bottom/bottom-to-top ribbon peel.
The lesson here is that it is often the case that the mere achievement of a goal is not all that one should be concerned with. More often than not, how you do something is just as important as what you are doing to accomplish a goal. How you peel an orange is just as important as peeling the orange.
This principle is especially true in the field of machine learning, where learning algorithms determine the how of an agent’s learning of hypotheses.
So, what is the necessary criteria in comparing machine learning algorithms, and how is the comparison done? Read on for an overview.
Choosing Favorites among ML Algorithms
The performance of a machine learning algorithm is the chief concern of an ML agent.
Usually the criteria for determining which algorithm performs the best involves asking how well an agent learns to do some task T. This is measured by analyzing the performance of the algorithms in consideration across the training sets being used.
What will (hopefully) be discovered is the rate of error between two algorithms, and the level of significance between the algorithms. If two algorithms are being compared, then the math is quite easy: Find the difference between the error rates of the two algorithms.
Let’s keep our orange peeling example running. We’re tired of peeling oranges with our feeble human hands, and want a big strong robot to do it for us instead. So, we invest in an artificial intelligence robot that peels oranges, and train it ourselves. Its developers have a number of learning algorithms designed to help it learn the art of orange peeling, so we choose two at random, Peel 1 and Peel 2.
The difference in performance between Peel 1 and Peel 2 is not immediately drastic. There are a few botched oranges left by the robot when using either algorithm, so a closer look is needed.
Fifty oranges were peeled under each algorithm. Peel 1 left 4 botched oranges, and peel 2 left 3 botched oranges. Based on numbers alone, we should want to go with Peel 2. But, you examine the oranges and their discarded peels, and you realize that there is a clear progression in Peel 1 that is absent in Peel 2: Peel 1 was able to more easily and quickly figure out that the ribbon-peel method, though time-costly, ensures a utility-boosting unharmed orange, while Peel 2 has the overwhelming error of choosing the jab-and-claw method in the interest of saving time, leaving an army of bruised, but not ruined, oranges in its wake.
So, despite the error rate of total botches being higher in Peel 1, you still choose this algorithm because it overall produces better orange-peeling methods in robot peelers. So, as is often the case, observing the small details of a task along with raw numbers data can be instrumental in choosing which algorithm should be chosen.
Choosing one algorithm over the other involves looking at both the raw numbers and the observable details of the algorithms’ results. In cases where the difference in error rate between two algorithms is not drastic, it is best to choose the algorithm that leads agents to utility-maximizing performance, even if that algorithm leads to slightly more errors in the long run.