Everything you need to know about model-free and model-based reinforcement learning


Reinforcement learning is one of the exciting branches of artificial intelligence. It plays an important role in game-playing AI systems, modern robots, chip-design systems, and other applications.

There are many different types of reinforcement learning algorithms, but two main categories are “model-based” and “model-free” RL. They are both inspired by our understanding of learning in humans and animals.

Nearly every book on reinforcement learning contains a chapter that explains the differences between model-free and model-based reinforcement learning. But seldom are the biological and evolutionary precedents discussed in books about reinforcement learning algorithms for computers.

Greetings, humanoids

Subscribe to our newsletter now for a weekly recap of our favorite AI stories in your inbox.

I found a very interesting explanation of model-free and model-based RL in The Birth of Intelligence, a book that explores the evolution of intelligence. In a conversation with TechTalks, Daeyeol Lee, neuroscientist and author of The Birth of Intelligence, discussed different modes of reinforcement learning in humans and animals, AI and natural intelligence, and future directions of research.

American psychologist Edward Thorndike proposed the “law of effect,” which became the basis for model-free reinforcement learning


In the late nineteenth century, psychologist Edward Thorndike proposed the “law of effect,” which states that actions with positive effects in a particular situation become more likely to occur again in that situation, and responses that produce negative effects become less likely to occur in the future.

Thorndike explored the law of effect with an experiment in which he placed a cat inside a puzzle box and measured the time it took for the cat to escape it. To escape, the cat had to manipulate a series of gadgets such as strings and levers. Thorndike observed that as the cat interacted with the puzzle box, it learned the behavioral responses that could help it escape. Over time, the cat became faster and faster at escaping the box. Thorndike concluded that the cat learned from the reward and punishments that its actions provided.

The law of effect later paved the way for behaviorism, a branch of psychology that tries to explain human and animal behavior in terms of stimuli and responses.

The law of effect is also the basis for model-free reinforcement learning. In model-free reinforcement learning, an agent perceives the world, takes an action, and measures the reward. The agent usually starts by taking random actions and gradually repeats those that are associated with more rewards.

“You basically look at the state of the world, a snapshot of what the world looks like, and then you take an action. Afterward, you increase or decrease the probability of taking the same action in the given situation depending on its outcome,” Lee said. “That’s basically what model-free reinforcement learning is. The simplest thing you can imagine.”

In model-free reinforcement learning, there’s no direct knowledge or model of the world. The RL agent must directly experience every outcome of each action through trial and error.

American psychologist Edward C. Tolman proposed the idea of “latent learning,” which became the basis of model-based reinforcement learning


Thorndike’s law of effect was prevalent until the 1930s, when Edward Tolman, another psychologist, discovered an important insight while exploring how fast rats could learn to navigate mazes. During his experiments, Tolman realized that animals could learn things about their environment without reinforcement.

For example, when a rat is let loose in a maze, it will freely explore the tunnels and gradually learn the structure of the environment. If the same rat is later reintroduced to the same environment and is provided with a reinforcement signal, such as finding food or searching for the exit, it can reach its goal much quicker than animals who did not have the opportunity to explore the maze. Tolman called this “latent learning.”

Latent learning enables animals and humans to develop a mental representation of their world and simulate hypothetical scenarios in their minds and predict the outcome. This is also the basis of model-based reinforcement learning.

“In model-based reinforcement learning, you develop a model of the world. In terms of computer science, it’s a transition probability, how the world goes from one state to another state depending on what kind of action you produce in it,” Lee said. “When you’re in a given situation where you’ve already learned the model of the environment previously, you’ll do a mental simulation. You’ll basically search through the model you’ve acquired in your brain and try to see what kind of outcome would occur if you take a particular series of actions. And when you find the path of actions that will get you to the goal that you want, you’ll start taking those actions physically.”

The main benefit of model-based reinforcement learning is that it obviates the need for the agent to undergo trial-and-error in its environment. For example, if you hear about an accident that has blocked the road you usually take to work, model-based RL will allow you to do a mental simulation of alternative routes and change your path. With model-free reinforcement learning, the new information would not be of any use to you. You would proceed as usual until you reached the accident scene, and then you would start updating your value function and start exploring other actions.

Model-based reinforcement learning has especially been successful in developing AI systems that can master board games such as chess and Go, where the environment is deterministic.


In some cases, creating a decent model of the environment is either not possible or too difficult. And model-based reinforcement learning can potentially be very time-consuming, which can prove to be dangerous or even fatal in time-sensitive situations.

“Computationally, model-based reinforcement learning is a lot more elaborate. You have to acquire the model, do the mental simulation, and you have to find the trajectory in your neural processes and then take the action,” Lee said.

Lee added, however, that model-based reinforcement learning does not necessarily have to be more complicated than model-free RL.

“What determines the complexity of model-free RL is all the possible combinations of stimulus set and action set,” he said. “As you have more and more states of the world or sensor representation, the pairs that you’re going to have to learn between states and actions are going to increase. Therefore, even though the idea is simple, if there are many states and those states are mapped to different actions, you’ll need a lot of memory.”

On the contrary, in model-based reinforcement learning, the complexity will depend on the model you build. If the environment is really complicated but can be modeled with a relatively simple model that can be acquired quickly, then the simulation would be much simpler and cost-efficient.

“And if the environment tends to change relatively frequently, then rather than trying to relearn the stimulus-action pair associations whenever the world changes, you can have a much more efficient outcome if you’re using model-based reinforcement learning,” Lee said.