Hierarchical Reinforcement Learning

There is a lot of hype around Reinforcement Learning (RL) nowadays. I believe this is with good reason: RL has achieved super-human performance in Atari video games and has even beat the world’s best player of Go. Currently, RL is also being applied to system’s management and configuration, autonomous driving, robotics and more. Will RL take over the world? Not yet, first there are many problems to deal with:

Sample efficiency: humans need practice to become a decent player of a video game, but the amount of practice a RL agent needs, is crazy! For simple video games like Breakout, we’re talking about thousands of hours of gameplay, of course, this is not a problem when you have a powerful server and you’re playing a video game, but what if you need a decent performance and have no access to the environment beforehand? There are many different approaches used to solve this like Few shot learning or Learning from Demonstrations, but this is still a hot topic in RL research.

Scaling: the efficiency becomes worse when the state and actions space grow. If you think of Tabular Q-learning, you have a column per action and a row per state, so if you have 10 states and 10 actions, then you have 100 q-values to calculate. But when you’re dealing with video games for example, the state is defined by the frames of the game that might have 100x100 pixels (just to make it round), where each pixel can assume a value between 0 and 255, so the number of possible states is 255¹â°â°â°â°, and let’s say we have 10 actions, then you will have 255¹â°â°â°â°â° q-values… oops. To make it worse, think about continuous state or action spaces; this is indeed a complicated problem. Deep RL can manage with this huge state spaces but it is still a challenge to do it in a sample efficient manner.

Generalization and transferring learning: in addition, RL agents can perform very well on a task for which they have already been trained for and only for that task. If they start with a new task, the performance will be terrible again.

Hierarchical Reinforcement Learning to the rescue

Let’s say while you’re reading this, you doorbell rings. What are you going to do? Well, you could think about getting up and going to the door, but actually, what you have to do is much more complex: you have to control many muscles in order to stand up, then you have to walk to the door, for which you again have to control your muscles and also balance yourself until you get to the door. However, you just think about the high level actions (getting up, going to the door), because the lower levels actions come naturally.

HRL is born exactly with this in mind. Instead of having just one policy that has to achieve the goal of the task, we now have several sub-policies that work together, in a hierarchical structure. This approach brings many benefits:

You can send your manuscript at https://bit.ly/2GFUS3A

Media Contact:

Lina James

Managing Editor

Mail Id: computersci@scholarlypub.com

American Journal of Computer Science and Engineering Survey

Whatsapp number: + 1-504-608-2390

Dec 05, 2020

Author

Hong Shang