Facebook Pixel Code

This is the forty-ninth article in a series dedicated to the various aspects of machine learning (ML). Today’s article will open back up our discussion of reinforcement learning, a topic we delved into a fair number of articles ago, for the purpose of filling you in on some aspects of this very popular form of learning that did not cover previously. In particular, we will go into Q learning, which is a form of learning where an agent needs to discover for itself the correct way to evaluate its right and wrong actions. 

There is a Japanese proverb that, in its translated form, goes “Get knocked down seven times, get up eight.” 

It can be assumed that this proverb was originally thought up to give humans a platitude about mental fortitude, but it can just as well be made the official motto of one of the most popular of all machine learning algorithms, reinforcement learning. 

We have covered the basics of reinforcement learning in a previous article, but this article will be focused on shining a light on some of the important aspects we did not quite get to cover last time. 

However, we would be remiss if we did not offer you a quick summary or recap of reinforcement learning. 

Reinforcement learning, or RL, is a form of learning where an agent is subjected to a trial and error style of learning hypotheses and achieving goals. 

If the agent is a robot, it may literally fall down 7 (or more, let’s say a number X) times while trying to perform a task. The strength of its learning algorithm can be measure by whether or not it is able to get up 8 (or X+1) times, or maybe just fall down less. (Here, we can take the phrase “fall down” to represent any form of failure an agent may experience). 

The very notion of reinforcement learning suggests the notion of an autonomous agent, one that will simply tip its hat to its developers and go out the door into the wild world to learn on its own about all of the unseen data. 

This image, really, is not too far from what a RL agent can do, but the developers, at least during the training stage, do not just leave the agent completely alone. Rather, they sort of watch from a distance, in a sense, by keeping a subtle and not constantly perceptible (for the agent, at least) “reward system” that functions a bit like treats for a dog, where an agent is “rewarded” with a computer-equivalent of a “good job” when it does a good job. Only, these rewards are given pretty inconsistently, so the agent will not always know whether it is doing a good job or not, only on occasion. It is guided by occasional hints. 

So, the burden is on the agent to formulate its own system of evaluation, just as it is expected to form hypotheses. This is done through a process called Q learning. 

Q Learning

Q learning creates an evaluation function that is focused on the expected reward, positive or negative, from certain states or actions. 

The agent does not need to know a single thing about its actions, environment, or the possible states it could be in beforehand when running an evaluation—in other words, it could get by with Q learning without a domain theory. 

Like many machine learning algorithms, the basic function is prediction, and the prediction here is what an action will do to change a certain state, and whether this change will be helpful with regards to reaching a goal or not. 

Overall, Q learning helps agents with the problem of limited knowledge. In reinforcement learning, the agent is basically left to its own devices when it comes to accomplishing a goal, with only occasional hints to whether it is on the right track or not. In the midst of this ambiguity, a method like Q learning can help it keep track of the very valuable rewards that developers offer it, and make strong, thoughtful predictions based off of this very limited knowledge. 


Reinforcement learning is a trial and error method of learning where an agent is offered only very occasional “rewards,” positive or negative, for its behavior. These sparse rewards are what an agent uses to inform its reaching of a goal, and the method Q learning helps agents better divine the kinds of rewards that are earned for which actions.