This is the forty-fourth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will continue our lesson on the concept of learning sets of rules in machine learning. We will dive deeper into the process of learning a “set” of rules, which we gave a glimpse at in our previous article. This includes a discussion of how to understand the meanings of sentences in FOL, or “First Order Logic.”
Our last article covered a specific type of learning employed amongst humans and machines alike—though, of course, it is humans who create the algorithms that determine how a computer learns—, which is learning “sets” of rules to navigate uncertain environments.
These sets of rules consist of “if-then” rules that account for the different twists and turns an agent may encounter on the way to accomplishing a goal.
We mentioned that, typically, agents learning sets of rules tend to use what are called sequential covering algorithms, where the rules are learned in a sequence. This means that an agent will learn one rule at a time.
While training an agent, the agent will be exposed to a certain amount of data, so a sequential covering algorithm will have that agent devise just a single rule from all of that data.
Then, once it has a rule, the algorithm will have the agent make another rule based off the data that the first rule did not cover, repeating this process until the agent has formulated a set of rules that cover as much of the data as the developer desires.
This is not the only form of learning rule sets, however. Though it is probably the most popular form of learning rule sets, mostly because it simplifies the process of learning a lot of rules, but sequential covering has its drawbacks. It does not always create a small or even entirely accurate list, which can be a problem.
However, some developers prefer decision trees, which we have covered in this series before. Here is a quick review: Decision tree algorithms will learn a whole host of options at once, then analyze the various “trees” of decisions to figure out which one most accurately fits the data.
So, instead of learning one rule at a time, an algorithm makes a number of predictions based off the data at once, having a full set ready at the go.
We also mentioned learning rules in First Order Logic in our last article, which needs to be expounded on.
Having to learn rules in FOL requires an algorithm of its own, which conduct inductive general-to-specific searches to create what are called “Horn” clauses, which are disjunctive sentences of logic with at most one positive literal.
Here are some logic terms to help you decipher that definition:
Constant: Thing that does not change no matter how it is described in the sentence. So, John is still John no matter what. Who John is in relation to other things, however, can change.
Predicate: Describes, or rather confirms or denies, relation between constants. So, to say that John is employed by Mike, “employed by” is the predicate in a logic sentence.
Function: A factor of something, like John’s gender.
Literal: A predicate or the negation of a predicate. So, to say that John is “employed by” or not “employed by” Mike is an example of a literal.
Clause: Disjunction of literals.
Disjunction: A sentence that tells you how two things relate to each other; typically an “or” sentence.
So, to say that a “Horn clause” has at most one positive literal, that basically means that the rest of the sentence consists of negative literals.
The task of an FOL algorithm is to get the agent to distinguish between the different kinds of rules, and the literals and constants, or terms, that comprise the many rules in a rule set.
If there are multiple constants of a similar type (like, say, dogs), a machine learning agent will need to distinguish between those two constants, typically by representing it by adding a subscripted 1, 2,…n amount of subscripts to the constant Dog.
The process of sequential learning starts with the whole data set, which the agent analyses, forms a single rule, then goes on to make a second rule based on the data that were not integral to the creation of rule one. The idea is to simplify the learning process while also accounting for as much data as need be. Rules are learned in FOL, which is a logic language that a machine learning agent needs an algorithm to understand.