using hierarchical reinforcement learning to balance conflicting sub- problems by: stephen robertson...

Using Hierarchical Reinforcement Learning to Balance Conflicting Sub-problems

By: Stephen Robertson

Supervisor: Phil Sterne

Presentation Outline

• Project Motivation• Project Aim• Rules of the Gridworld• Flat Reinforcement Learning• Feudal Reinforcement Learning• State Variable Combination Approach

Project Motivation

• Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient

• Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality

Project Aim

• Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem

• Comparing the various algorithms to each other and to flat Reinforcement Learning

Rules of the gridworld

• Possible Actions: Left, Right, Up, Down and Rest

• Collecting food and drink increases nourishment and hydration respectively

• After landing on the tree, the creature is carrying wood which it can use to repair its shelter

Rules of the gridworld• Resting in a repaired

shelter increases health in proportion to the shelter condition

• Landing on the lion decreases health and results in a direct punishment

• After every 4 steps, nourishment, hydration, and shelter condition decrease by 1. After 10 steps, health decreases by 1.

Flat Reinforcement learning

• Sarsa with eligibility traces was used• To get Flat Reinforcement Learning

working, the task needed to be simplified slightly

• Limited to a 6x6 gridworld• Nourishment, Hydration, Health and

Shelter Condition minimised to 5 discrete levels each

• Total states: 6 x 6 x 5 x 5 x 5 x 5 x 2 = 45000

• Managable

Flat Reinforcement Learning

• The given task requires a large amount of exploration in order to find the optimal solution

• Total exploration at first, decreasing gradually until finally total exploitation

• Optimistic initialisation of tables to maximum possible reward of 6400 encourages efficient exploration

Flat Reinforcement Learning Results

Feudal Reinforcement Learning

• Needs to be modified for the given problem

• In the simple maze problem, state variables change independently, and don’t change by more than 1

• In the simple maze problem, high level actions can be defined as the same as low level actions

Feudal Reinforcement Learning

• Main problem with the complex problem is the high level actions are hard to define

• State variables can change simultaneously and by more than one, i.e. creature can move to the left, and fully satisfy hunger in one step, changing two state variables simultaneously

• High level actions are defined as desired high level state

Feudal Reinforcement Learning Results• Feudal reinforcement learning failed

horribly

State Variable Combination Approach• In a problem with conflicting sub-problems,

sub-problems tend to be defined by a limited set of state variables

• Sub-agents are created, each in charge of a limited set of state variables

• Some sub-agents will be inherently equipped to solve a sub-problem

• Some sub-agents will not hold any useful information

• By incorporating all possible combinations, we minimise the amount of designer intervention

Examples of Sub-agents

Choosing between sub-agents

• If the sub-agent which predicts the highest possible reward for a given state is obeyed, the best action should get chosen

• The problem with this is that some sub-agents which do not hold any useful information might falsely predict a high reward

• Reliability of sub-agents also needs to be taken into account

• This is achieved by keeping track of the variance of predicted rewards

• High Variance = Unreliable Prediction• Low Variance = Reliable Prediction

Results

Questions ?

using hierarchical reinforcement learning to balance conflicting sub- problems by: stephen robertson...

Documents