This is the twenty-second article in a series dedicated to the various aspects of machine learning (ML). Today’s article will dive into the issue of bias in Machine Learning agents, specifically biases in the process of induction. Note that this is not covering bias in a moral or ethical sense, where a ML agent may make a decision based on bias against someone’s race or gender, which is a subject for another article. Rather, the bias discussed in this article deals with agents’ preference for the most efficient methods of decision-making. 

Remember decision trees from earlier in this series? If not, here’s a refresher: Decision tree learning is a machine learning method where the agent sorts through the many decisions available during any particular instance. It mimics, in a more neat way, humans’ if-then reasoning, like “If a car is driving head-on towards me in my lane, then I will swerve out of its way.” Human drivers think that, and swerve out of the way because of a basic survival instinct, but self-driving cars will likely have to reason that it is preferable to not be in a car wreck than not, though this reasoning and ultimate decision to swerve will be accomplished in a fairly speedy amount of time. 

We previously discussed in our decision tree articles how decisions are made in this method of learning, which basically consists of the agent searching through the tree to find the decision best fit for the current circumstance, like swerving out of the way in the circumstance of facing a head-on collision. It is called decision “trees” because this method can be represented in a treelike form, where attributes, like “Cars in lane” are leaves that branch out to attribute values, like “Car driving towards you,” all of which are followed to a decision that solves the root problem of what the driver should do next. 

If you read those articles, you’ll remember that one of the major issues in decision tree learning is when the decision tree becomes too bulky. Some decisions require that the decision tree be large, but that the size of the tree can make it too costly for an agent to run. Though methods like pruning exist to trim off erroneous or nonessential attributes, there is a feature of machine learning agents that can prevent the need to spend extra time and power pruning a decision tree, by developing a more manageable, “affordable” tree in the first place. That feature is bias. 

Machine Learning Bias

When we think of the word “bias,” our mind tends to offer us negative examples of bias, like racial profiling or gender-based discrimination. Sadly, such biases have been found to exist in machine learning agents, which we will discuss in a future article, but today’s piece will focus on a different kind of bias in ML agents, which is a bias for shorter decision trees. 

Inductive bias consists of the assumptions that an agent uses to justify its classifications of future decisions. If the agent swerves into another car to avoid the one about to crash into it head-on, then it will still be damaged, although it will still prefer this decision to the more deadly head-on collision. This is because in the (hopefully rare) instances of possible head-on collisions, the agent will have the assumption that it is preferable to suffer less damage than more. 

The kind of inductive bias that we want to focus on here is an agent’s bias towards short, rather than complex trees. 

Agents typically don’t work with just one decision tree, but rather multiple models, some more complex than others. An agent’s machine learning algorithm has multiple approaches for deciding how to search each tree. It can go breadth-first, where it tries to identify as many attributes that lead to a decision as is manageable, or depth-first, where it prioritizes following each “branch” down to a decision, one at a time. 

Whether breadth or depth is chosen, the hypotheses or decisions that are the shortest to arrive at, meaning have the least “branches” and “leafs” that lead to it, will be stored and privileged in memory. 

This bias did not arise because of machine learning, but was officially formulated centuries ago by William of Occam, who said, in paraphrase, that one should prefer the simplest hypothesis that goes with the available data. This theory is called Occam’s razor. 

It works because, well, it’s easier for the agent. It saves time, memory usage, and often results in a satisfiable outcome, though maybe not the best. Occam’s razor, however, can lead to problems in generalization in AI agents during the training phase, meaning that the agent becomes too used to the data in the training set and struggles to “branch out” to new hypotheses based on previously unseen data. 


AI agents prefer the easy way to the hard way, and in machine learning this means choosing decision trees that are shorter rather than more complex. The impetus for choosing the easy way is that such decision trees are less costly to run in terms of time, memory usage and other such aspects of performance.