Facebook Pixel Code

This is the thirtieth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will dive into the machine learning process of evaluating a learned hypothesis, and what methods an agent employs to do such a thing. 

If you’ve ever been to a casino, you’ve probably seen the rows and rows of slot machines that seem like nothing more than glorified piggy banks that are funded through donations from the people sitting there. People line up with a cup full of coins, or dollar bills, and set up shop for hours at a time in the hopes of becoming jackpot royalty. 

It may seem mysterious or nonsensical to the outside observer, but most of these slot-machine hopefuls are operating on a principal, either borrowed from a get-rich-on-gambling book or founded on their own desire or need for gambling. 

Many times, all you see is someone pumping coin after coin, bill after bill, into a slot machine to no success. This sight is what makes you sneer and head for the roulette table like a real classy gambler. However, what you’re not seeing is the time these gamblers won big, or saw someone win big, just from putting a coin or single bill into the machine. And thus, the principle that keeps them in front of that slot machine was born: “If I sit here and patiently feed this machine money, it will give me a high return of investment. Eventually. Some day. Tomorrow. Just one more day of this and I’m done. Just one more hour. Minute. Until midnight. Just one more. Starting…now. Ah, we can’t end on that note. How much money is left in my wallet? If Little Johnny gets a job would he be able to start paying for his college tuition?”  

Whether you think their sit-and-spend hypothesis about slot machine odds is sage or not, it is still a hypothesis with a level of accuracy that can be measured, rather than accepted or dismissed with a “True” or “False.” 

If you were to give a machine learning agent, RoboGambler8000, $10,000 and the goal to double that at its nearest casino through slot machine wins exclusively, then it would indeed need to test the accuracy of the slot machine hypothesis. 

Evaluating Hypotheses

Our last article covered the problems that pop up in evaluating hypotheses (bias and variance), and this article will cover the process of evaluation itself. 

The main goal of evaluating a hypothesis is to see if it accurately holds for future instances. So, if by blind luck RoboGambler8000 ends up, improbably, hitting the jackpot three out of five times during the first five spins, then it may form the hypothesis that it has a sixty percent chance of winning a jackpot every time it sits down at a slot machine. But, as it continues to spin, and spin, and spin, and spin, up until it has a win, it realizes that the 60% hypothesis is not quite working out for it, and a reevaluation is in order. 

This hits on the difference between sample error and true error. Sample error denotes the error of the hypothesis based on the limited data it has gathered (five spins, three of them jackpots). True error is the chance that RoboGambler8000’s 60% hypothesis will incorrectly predict the success of a spin based on the entire unknown number of spins that lay ahead of it (based on the 60% thing, that particular hypothesis will be wrong quite a bit of the time). 

Discovering the rate of expected error can be instrumental in guiding a ML agent to greater success. 

The sample error informs the true error, and the degree to which the sample error can be a dependable estimate for the true error is called the confidence interval. 

Error estimation and prediction are at the heart of hypothesis evaluation. One such pillar of prediction is the binomial distribution, which predicts the probability of an event E, like a jackpot, happening over time, like N number of slot machine pulls. Under RoboGambler8000’s hopelessly inaccurate hypothesis, it would predict 60 jackpots per 100 pulls, mistakenly estimating that it will be able to double its money in no time. 


Evaluating a hypothesis always consists in seeing whether it works or not, of course, but an ML agent is particularly concerned with how often a hypothesis will work. As a result, error estimation and prediction are the core motivators for an agent evaluating the accuracy of a learned hypothesis.