reinforcement learning control with robust stability chuck anderson, matt kretchmar, department of...

1
Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical and Computer Engineering Douglas Hittle, Department of Mechanical Engineering Colorado State University, Fort Collins, CO Reinforcement Learning Agent in Parallel with Controller Reinforcement learning algorithm guides adjustment of actor's weights. IQC places bounding box in weight space of actor network, beyond which stability has not been verified. Incorporating Time-Varying IQC in Reinforcement Learning weight space (high- dimensional) initial guaranteed- stable region Step 1 initia l weight vector Step 0 trajecto ry of weights while learning Step 2 must find new stabl e regio n Step 3 next guaranteed-stable region Step 4 Now learning can continue until edge of new bounding box is encountered. Step 5 weight space (high- dimensional) UNSTABLE REGION ! final weight vector weight trajectory with robust contstraints weight trajectory without robust contstraints Trajectory of Weights and Bounds on Regions of Stability B B C C D D E E A A initial weight vector Motivation Robust control theory Guarantees stability Results in less aggressive controllers Reinforcement learning Optimizes the performance of a controller No guarantee of stability while learning externalair T T T T T T T fan inputs outputs T AE C DE C DR T AR T AI T WS C WH C VP F W T AO F A mixing box filter return air discharge air C BS Interface P E T WI P E E P variable frequency drive T WO valve heating coil w aterheater airflow Experimental HVAC System ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 1 1 0 1 1 1 t t t t T k k t k t k t t T k k t k t k t t t t a s Q a s R E a s R a s R E a s R a s R E a s Q Reinforcement Learning ) , ( ) , ( ) , ( ) , ( 1 1 t t t t t t t t a s Q a s Q a s R E a s Q Subtract right side from left to get algorithm for updating Q ) , ( ) , ( ) , ( ) , ( 1 1 t t t t t t t t t a s Q a s Q a s R a s Q Replace expectation with sample (Monte Carlo approach) Temporal-difference error Temporal-difference error T k k t k t k t t a s R E a s Q 0 ) , ( ) , ( action action state state policy function policy function value value function function discount factor discount factor reinforcement (|error|) reinforcement (|error|) Robust Control based on IQCs Uncertainties (D) Contoller/Plant (M) v w An Integral Quadratic Constraint (IQC) describes the relationship between signals as 0 ) ( ˆ ) ( ˆ ) ( ) ( ˆ ) ( ˆ * dw jw w jw v jw jw w jw v Stability of the closed loop system is guaranteed if I I jw M jw I jw M ) ( ) ( ) ( * for all w and for e > 0. Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem. Reference Output Good response Nominal Perturb ed Terrible response Robust Reinforcement Learning Perturbed case, no learning Perturbed case, with learning Through learning, controller has been fine- tuned to actual dynamics of real plant without losing guarantee of stability ! Sum Squared Error Nominal Controller 0.646 Robust Controller 0.286 Robust RL Controller 0.243 Conclusions IQC bounds on parameters of tanh and sigmoid networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability) Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems. Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability. Initial, conservative robust Integral Quadratic Constraints M 1 1 2 2 M 1 1 2 2 M ( ) () Neural Net and Robust Control with IQCs Bounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blue Robust controller and plant in red First Example Second Example Without robust constraints, becomes unstable before learning final stable solution. Third Example Distilla tion Column Fourth Example 1 st Order 2 nd Order ) ( ) ( ) ( ) ( ) 1 ( t x t y t u t x t x ) ( 0 1 ) ( ) ( 1 0 ) ( 9 . 0 05 . 0 05 . 0 1 ) 1 ( t x t y t u t x t x Without robust constraints, becomes unstable before learning final stable solution.

Upload: gillian-green

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical

Reinforcement Learning Control with Robust StabilityChuck Anderson, Matt Kretchmar, Department of Computer Science,

Peter Young, Department of Electrical and Computer Engineering

Douglas Hittle, Department of Mechanical Engineering

Colorado State University, Fort Collins, CO

Reinforcement Learning Agent in Parallel with Controller

Reinforcement learning algorithm guides adjustment of actor's weights.

IQC places bounding box in weight space of actor network, beyond which stability has not been verified.

Incorporating Time-Varying IQC in Reinforcement Learning

weight space (high-dimensional)

initial guaranteed-stable region

Step 1

initial weight vector

Step 0 trajectory of

weights while

learning

Step 2

must find new stable region

Step 3

next guaranteed-stable region

Step 4

Now learning can continue until edge of new bounding box is encountered.

Step 5 …

weight space (high-dimensional)

UNSTABLE REGION !final weight vector

weight trajectory with robust contstraints

weight trajectory without robust contstraints

Trajectory of Weights and Bounds on Regions of Stability

BB

CC

DD

EE

AAinitial weight

vector

Motivation

Robust control theoryGuarantees stability

Results in less aggressive controllers

Reinforcement learning

Optimizes the performance of a controller

No guarantee of stability while learning

external airT

T

T

T T

T

T fan

inputs

outputs

T AE

CD

E

CD

R

T AR

TA

I

TW

S

CW

H

CV

P

FW

TA

O

FA

mixingbox

filter

return air

discharge air

CB

S

Interface

P E

TW

I

PE

EP

variablefrequency

drive

TW

O

valve

heating coil

water heater

air flow

Experimental HVAC System

),(),(

),(),(

),(),(),(

11

1

011

1

tttt

T

kktkt

ktt

T

kktkt

ktttt

asQasRE

asRasRE

asRasREasQ

Reinforcement Learning

),(),(),(),( 11 tttttttt asQasQasREasQ

Subtract right side from left to get algorithm for updating Q

),(),(),(),( 11 ttttttttt asQasQasRasQ

Replace expectation with sample (Monte Carlo approach)

Temporal-difference Temporal-difference errorerror

T

kktkt

ktt asREasQ

0

),(),(

actionactionstatestate

policy functionpolicy function

value value functiofunctionn

discount factordiscount factorreinforcement (|error|)reinforcement (|error|)

Robust Control based on IQCs

Uncertainties (D)

Contoller/Plant (M)

v w

An Integral Quadratic Constraint (IQC) describes the relationship between signals as

0)(ˆ

)(ˆ)(

)(ˆ

)(ˆ*

dwjww

jwvjw

jww

jwv

Stability of the closed loop system is guaranteed if

II

jwMjw

I

jwM

)()(

)(*

for all w and for e > 0.

Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem.

ReferenceOutput

Good response

Nominal Perturbed

Terrible response

Robust Reinforcement LearningPerturbed case, no learning

Perturbed case, with learning

Through learning, controller has been fine-tuned to actual dynamics of real plant without losing guarantee of stability !

Sum Squared Error

Nominal Controller 0.646

Robust Controller 0.286

Robust RL Controller 0.243

Conclusions• IQC bounds on parameters of tanh and sigmoid

networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability)

• Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems.

• Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability.

• Initial, conservative robust controller becomes more aggressive through adaptation to actual physical system.

• See http://www.cs.colostate.edu/~anderson/res/rl/

Integral Quadratic Constraints

M

1 1 2 2

M1 1

2 2

M

( )

( )

Neural Net and Robust Control with IQCsBounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blueRobust controller and plant in red

First Example

Second Example

Without robust constraints, becomes unstable before learning final stable solution.

Third Example

Distillation Column

Fourth Example

1st Order

2nd Order

)()(

)()()1(

txty

tutxtx

)(01)(

)(1

0)(

9.005.0

05.01)1(

txty

tutxtx

Without robust constraints, becomes unstable before learning final stable solution.