Facebook Pixel Code

This is the twenty-ninth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will discuss the problems to be found in the machine learning process of evaluating a learned hypothesis. 

Many Americans have heard the legend about George Washington cutting down his parents’ cherry tree when he was only a small child. Though the truth of the story is widely disputed, with many claiming that a posthumous biographer of Washington added the story to sell more books and/or push Federalist values like self-discipline. Regardless of the certitude or falsity of the story, it is useful for introducing the machine learning concept of hypothesis evaluation. 

When George Washington cut down his parents’ cherry tree, he probably had the hypothesis that such an act would go over well. Maybe he had cut down a rotting tree or pulled a weed while doing yard work, gained praise for the act, and got it in his head that the removal of any vegetation on the Washington property was praiseworthy. 

When his father came home and grew furious at the sight of the fallen tree, you can probably guess that little George gained a new perspective on his tree-chopping hypothesis, evaluating it as a poor hypothesis indeed whose performance does not delight his parents. 

However, little George has yet another hypothesis bubbling in his mind after employing it successfully at school earlier that week: Framing his mistakes through the guise of virtue and humility. 

“Father, I cannot tell a lie, so I must admit that it was I who chopped down the cherry tree. Also, you’re speaking to a future president here so you ought to cut me some slack.” 

Exchange a couple of nouns and verbs, and that is word for word the exact sentence he used to get his math teacher to not give him detention for clapping the chalk dust off the chalkboard erasers. In this case, his hypothesis that the performance of this sentence will get him out of trouble was indeed correct, and this second confirmed success allows him to evaluate it as a good hypothesis. 

But, is it a good hypothesis? Perhaps there are a few problems inherent in his testing of the hypothesis that could lead to an inaccurate evaluation. 

As we like to do in the machine learning series, we will now apply this human process to the world of artificial intelligence. We’ve already been using the appropriate vocabulary (“hypothesis” and “evaluation”), so all we need to do is take a closer look at how the evaluations of a machine learning agent can be founded on problems. 

Problems with Learned Hypotheses and ML Agents

It’s important to note that we are talking about learned hypothesis, which are the hypotheses an agent forms through its own experience, either in its training phase or beyond. 

There are two main problems that can crop up in order to make a hypothesis less than satisfactory: Bias, and variance. 

The first is bias/overfitting, where an agent’s hypothesis becomes completely dependent on a specific training set. So, a cheese-slicing agent may be used to exerting a certain force on cheese blocks of a certain size, let’s say Brie, but when it is offered a wheel of cheddar cheese of the same size, it has a hard time cutting through it, because the consistency of cheddar is tougher than Brie. It’s previous hypothesis, that exerting X amount of force on a cheese block of Y size, will be reevaluated as poor. 

The second is variance. We can bring back little George for this one. Though the two examples where he tried the “I cannot tell a lie” trick were successful, when he told it to his math teacher and his father, this success may not accurately reflect the success of the hypothesis in the future. In the context of these examples, he is using the trick as a six year old boy talking to older authorities who, due to the boy’s age, are willing to cut him some slack. If Washington were to carry this hypothesis into his later life, it would be clear that he would not be let off the hook. 

In a machine learning example, the hypothesis that the force cheese wheel deserves based on its size is dependent entirely on the variety of cheeses that the agent is trained on. The success of the hypothesis varies from case to case, so the estimation at any one time may vary from the true accuracy of the hypothesis. Trial and error, then, are necessary for seeing the variety of ways that a hypothesis can or cannot work.


In this article we covered how bias and variance in data sets can cause issues in the evaluation of a learned hypothesis. Both problems are closely related, as they concern the diversity of examples that an agent has to draw from in order to evaluate the accuracy of a hypothesis.