reinforcement learning hidden markov model - cog sciajyu/teaching/cogs202_sp14/slides/lect6.pdf ·...

41
Hidden Markov Model Reinforcement Learning Week 6 Presentation Yashodhan, Chun, Ning

Upload: vucong

Post on 15-Apr-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Hidden Markov ModelReinforcement Learning

Week 6 PresentationYashodhan, Chun, Ning

Page 2: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Hidden Markov Models (HMM)Questions:● What are HMMs useful for ?● What are some of the assumptions underlying HMMs ?● What are the 3 problems for HMMs ? Explain each in

terms of the coin toss example.

Page 3: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Coin Toss Example3 coins - C1, C2, C3Select a coin randomly, flip it and repeat

Given only the sequence HTTHTHHT, can we find out the sequence of coins that was chosen ?

C1

H

C2

T

C1

T

C3

H

C1

T

C1

H

C2

H

C3

T

Coin

Outcome

Page 4: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Hidden Markov Model

C1

H

C2

T

C1

T

C3

H

C1

T

C1

H

C2

H

C3

T

Coin

Outcome

State sequence i1,i2,...iT

Observation sequence O1,O2,...,OT

N = number of distinct states (N = 3 here)M = number of distinct observation symbols (M = 2 here)T = length of observation sequence (T = 8 here)Denote the N states by 1,2,...,N (state i corresponds to coin i being chosen)Denote the M observation symbols by V = {v1,...,vM} (v1 = H, v2 = T)

Page 5: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Hidden Markov ModelComponent Meaning Example

Initial state distribution Probability of being in state i at t = 1

For i = 1, this is the probability of choosing coin 1 at t = 1

Transition matrix Probability of a transition from state i to state j

a12 is the probability of choosing coin 2 immediately after coin 1

Emission matrix Probability of observing vk in state j

b1(2) is the probability of observing Tails given that coin 1 has been chosen

Page 6: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

AssumptionsFinite context

Shared distributions

Page 7: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Three Problems for HMMsProbability of the observation sequence

Choosing the most likely state sequence

Estimate the parameters of the HMM

Page 8: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Problem 1 - Direct computation

Involves 2TNT multiplications. For N = 5, T = 100, this is ~1072 multiplications

Marginalization

Product Rule

C.I.

Page 9: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Problem 1 - Forward VectorProbability of sequence of t observations with state i at time t

marginalization

product rule

C.I.

Base case of recursion

Page 10: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Problem 1 - Forward Vector

Now we can write P(O) in terms of the forward vector

marginalization

Page 11: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Problem 1 - Using Forward VectorCompute forward vector at t = 1

Compute forward vectors for t = 2 to T

Compute probability of the observation sequence

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

Number of multiplications is of the order of N2T

Page 12: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Problem 1 - Using Backward Vector

Compute backward vector at t = T

Compute backward vectors for t = T-1 to 1

Compute probability of the observation sequence

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

Probability of observation sequence from time t+1 to T given state i at time t

Page 13: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Reinforcement Learning

Agent and environment interact at discrete time steps : t = 0, 1, 2,..Agent observes state at step t : st ∈SProduces action at step t : at ∈ A(st )Gets resulting reward : rt +1 ∈ℜ and resulting next state : st +1Policy at step t, πt :a mapping from states to action probabilitiesπt (s, a) = probability that at = a when st = s

Page 14: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Goals and Rewards● Reward: a single number rt at each time step● Agent’s goal: maximize cumulative reward in the long

run● Examples of rewards

○ Maze: +1 for escape, -1 for each time prior to escape○ Walking: proportional to robot’s forward motion

● Focus is on what the robot should achieve, not how it should be achieved○ Chess: reward only for winning, not for achieving sub-goals

Page 15: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Returns - Formalizing the goalReward sequence after time t: rt+1,rt+2,rt+3…Return Rt: function of the reward sequenceMaximize expected return Episodic tasks:

Continuing tasks (discounted return):

where is the discount rate

Page 16: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

ExampleFailure: the pole falling beyond a critical angle or the cart hitting the end of the track

As an episodic task: episode ends upon failurereward = +1 for each step before failurereturn = number of steps before failure

As a continuing task with discounted returnreward = -1 upon failure, 0 otherwisereturn = , for k steps before failure

In either case, return is maximized by avoiding failure for as long as possible

Page 17: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

The Markov PropertiesIn general case, the environment state may depends everything that has happened earlier.

If the environment state has Markov property, the environment response at t +1 only depends on state and action in the previous time.

Page 18: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Markov Decision Process (MDP)Definition: A reinforcement learning task that satisfies the Markov property.

Transition Probability:

Expected value of next reward:

Page 19: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Recycling Robot MDP

❏ At each time the robot needs to decide whether it should 1). actively search for a can, 2). remain stationary and wait for someone to bring it a can, 3). go back to home to recharge its battery

❏ Reward = numbers of cans collected❏ Searching will collect more cans but lower the battery. If

it runs out of battery, has to be rescued❏ decision is solely based on energy level of the battery

Page 20: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Recycling Robot MDP, cont’s

❏ Searching beginning with high energy leaves energy level high with probability and low with 1 -

❏ Searching beginning with low energy leaves energy level low with probability depleted with probability 1 -

❏ Each collected can is counted as unit reward❏ Rescue will result -3 reward

Page 21: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Recycling Robot MDP, cont’s

Page 22: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

State-Value Function for policy ❏ State-Value function: the expected return when starting

in the state and following a certain policy ❏ Policy: A policy is a mapping from each state to

the probability of taking action in state

Page 23: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

State-Value Function, cont’sBackup DiagramBellman equation for

Page 24: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Action-Value Function for policy Action-Value function: Expected return starting from state , taking the action , and then following policy .

Page 25: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

GridworldAction: North,south,east,westReward: 1). -1 for taking the agent off the grid; 2). 0 for other action except those that move the agent out of the special states A and B; 3) +10 for any action in state A; 4) +5 for any action in state B

Page 26: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Optimal State-Value Functions

❏ There are always one or more policies that are better than or equal to others. These are optimal policies denoted by

❏ Optimal state-value functions

Page 27: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Bellman optimality equation for Bellman optimal equation Backup diagram

Bellman optimal equation has a unique solution independent of the policy

Page 28: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Optimal Action-Value Function for Optimal action-value function gives expected return for taking action in state and then following the optimal policy

Page 29: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Bellman optimality equation for

Backup diagram

Page 30: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Greedy policy

● For each state, there will be one or more actions at which the maximum is obtained the Bellman optimality equation. Any policy that assigns non-zero probability only to those action is an optimal policy(greedy policy).

● If one uses optimal value function to evaluate the one-step consequences of actions, then the greedy policy is actually optimal in the long-term sense.

Page 31: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Bellman Optimality Equations for the Recycling Robot

Page 32: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Yu & Dayan 2005● Expected Uncertainty and Unexpected Uncertainty● AcetylCholine and Norepinephrine● Observed Phenomenon:

Page 33: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Task Paradigm

Page 34: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Model for the Task and Neurochemistry

Page 35: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Generic Internal Model

Page 36: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Three Models● The Ideal Learner

● The Approximate Inference Model

● The Bottom-up Naive Model

Page 37: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Model ComparisonCost:

Page 38: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

NE and Ach, using Approximate Model

Page 39: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Without Depletion of NE or Ach

Page 40: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

With Depletion of NE or Ach

Page 41: Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf · Hidden Markov Models (HMM) Questions: What are HMMs useful for ? What are some of

Different “Depletion” Scenario