This is the thirty-sixth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will go further into detail on Bayesian learning, one of the key approaches to learning in the machine learning field, covering the different ways that the Bayesian method is employed in machine learning. Specifically, we’ll cover how Bayesian learning can be extended from hypothesis prediction to classification.
Bayesian algorithms are the gift that keeps on giving. There are many aspects to this topic, which is why we’re extending the discussion to this second article. What we’ll be doing here is providing an overview of the various ways that Bayesian learning crops up during the machine learning process.
Concept Learning with Thomas Bayes
Bayes theorem can provide a solid foundation for a concept learning algorithm. Recall that the process of concept learning involves an agent forming a hypothesis based on its experiences, and that Bayesian algorithms are directed towards picking the “best,” or most likely, hypothesis.
Our last article showed how Bayesian algorithms offer the most likely “a posteriori” hypothesis, which means the hypothesis that is the most likely to be true given that some data D is observed.
The aim here is that the hypothesis will either be or contain the concept that the machine learning agent wishes to learn, such as the speed that a car needs to drive in order to demolish the agent in a head-on collision.
Optimal Classifications using Bayes
What is the difference between a hypothesis and a classification?
In machine learning, an agent’s hypothesis tells the agent the chances that some action will be a success or not in a certain instance. Classification is the agent deciding whether the instance was a success or failure.
There are other ways to classify an instance, and uses for classification, but we’ll stick to our simplified binary here for the sake of education.
Now, we don’t want to give the impression that hypotheses and classifications are all that distinct, because it is often the case that hypotheses inform a classification.
That is exactly the case with a Bayes optimal classifier, which combines the probabilities of different hypotheses in order to find a value, v, that is the correct classification of an instance. A Bayes optimal classifier is, more often than not, the very best method for classification in any instance.
Why do we need a probability method for classifications? Well, just to make sure that the classification is correct, is all. Though it may be immediately apparent to a human whether we have failed or completed a task, an AI agent will need to rely on hypotheses, data, and a host of probabilities related to these things.
Naive Bayes Classifier
The naive Bayes learner is one of the most effective classification methods out there, able to match at times much more complicated and costly methods such as neural networks.
The Bayesian classification method consists of giving an instance (i.e., an event that just occurred and was observed) the most likely value, based on certain attributes that describe the instance. Let’s say the value is “success,” and the instance was the bottling of a soda bottle. Attributes like level of spillage and the tightness of the cap can be considered descriptive attributes.
The method is called “naive” because it simplifies the classification process by assuming that the attributes are “conditionally independent,” which means that the attributes are irrelevant given the value. So, this method basically posits that the probability of observing spillage and a loose cap is just a product of seeing these two things occur on their own anyways.
As it turns out, even in instances where the attributes cannot be considered conditionally independent, such as in natural language processing, the naive Bayes method is still a strong way of classifying instances.
Bayesian learning is one of the key methods for not only hypothesis prediction, but classification as well. When it comes to concept learning, Bayesian learning can be used to hunt out a certain concept that an agent wants to learn, such as the force it needs to exert in order to effectively crack a walnut. In classification, the Bayesian method is used to find a certain value that most effectively describes an instance, such as “success” or “failure.” The probability that a classification is correct is something that in itself needs to be calculated for a machine learning agent, which is why so many of them employ methods such as naive Bayes classification in order to garner the level of certainty they can assign to their judgments.