Facebook Pixel Code

This is the thirty-fifth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will go into detail on Bayesian learning, one of the key approaches to learning in the machine learning field. 

If you’ve made it this far in the series, you probably know by now that machine learning involves a lot of educated guesses. An AI agent, like a human being, does not have access to the future, so in lieu of clairvoyance an agent will rely on probability-based reasoning to make inferences about its actions and the impact they might have on the environment. 

There are methods galore for enacting probability-based reasoning, but at the top of the pile is Bayesian learning. It is the Cadillac of probabilistic approaches to inferences, a souped-up dream of a method that is still amazingly practical. 

What a Bayesian learning algorithm yields is a calculated probability that speaks to the likelihood of a hypothesis, like “the object hurtling towards me is a car,” and its popularity stems from its ability to best other methods, such as decision trees. 

So, without further ado, let’s jump into the nitty gritty of this renowned probability calculator. 

The Bayesian Method

Bayesian algorithms tend to account for every bit of minutia during training or real world action, considering each example and adjusting the probability estimate accordingly. Some algorithms seek to either keep or throw out any one hypothesis, so the difference here is that a Bayesian method will keep tabs on a hypothesis to see how well it works or not through testing. 

They can also provide probabilities about probabilities, like saying there is a 76% chance that it is the case that it has a 40% chance of surviving being run over by the car hurtling towards it. 

In addition, multiple hypotheses can be combined in order to “read” a certain occurrence in order to see which explanation best fits the observed data. 

As great as these benefits may be, one of the drawbacks of Bayesian algorithms is that, to be effective, they often require many additional probabilities to already be known, like the the probability that the agent is driving on the correct side of the road. Bayesian methods are also, alas, quite costly to run from a computation standpoint, though the cost can be alleviated at times, depending on the context. 

At this point, you may still be tortured by the million dollar question: What does “Bayesian” even mean? Well, as it turns out, “Bayes” is a human being whose probability theory serves as the bedrock for the titular machine learning method. 

Bayes Theorem

Thomas Bayes is the human being in question, and his theorem sought to provide a method that satisfies one of the most basic desires in calculating probabilities, that is, to find the most likely, or probable, hypothesis based on the available data. 

Bayes theorem makes use of two philosophical terms: A priori and a posteriori. The former denotes knowledge that is deduced without experience or observation, whereas the latter deals with knowledge gained through experience and observation. 

The a priori probability of an hypothesis is used, in combination with the probability of some piece of data will occur, then the probability of that data occurring given that the hypothesis holds, in order to yield the ultimate probability of the a posteriori hypothesis. 

Confused? Don’t worry, the crucial thing that you need to understand here is that the Bayes theorem can tell an agent which hypothesis is the most likely. 

This is useful in machine learning because, at any one time, there could be a slew of hypotheses to choose from, all of which could lead to a different action, so being able to sort through them to find the most educated guess is an essential component of decision making in uncertain situations. 

Summary

Bayesian learning is a cornerstone of machine learning, and has been for quite some time now. As the reigning champion of probability-based methods of inference, Bayesian algorithms provide the essential function of deducing which hypothesis is the most likely to be correct. This is done through a calculation that involves a combination of any chosen hypothesis given no observed data along with data that is also estimated to occur, in order to yield the probability of the hypothesis occuring given the existence of that piece of data. At the end of the day, Bayesian algorithms are the king of probability methods in machine learning because it allows the agent to discover which hypothesis is the most likely to occur.