reinforcement learning control with robust stability chuck anderson, matt kretchmar, department of...

Reinforcement Learning Control with Robust StabilityChuck Anderson, Matt Kretchmar, Department of Computer Science,

Peter Young, Department of Electrical and Computer Engineering

Douglas Hittle, Department of Mechanical Engineering

Colorado State University, Fort Collins, CO

Reinforcement Learning Agent in Parallel with Controller

Reinforcement learning algorithm guides adjustment of actor's weights.

IQC places bounding box in weight space of actor network, beyond which stability has not been verified.

Incorporating Time-Varying IQC in Reinforcement Learning

weight space (high-dimensional)

initial guaranteed-stable region

Step 1

initial weight vector

Step 0 trajectory of

weights while

learning

Step 2

must find new stable region

Step 3

next guaranteed-stable region

Step 4

Now learning can continue until edge of new bounding box is encountered.

Step 5 …

weight space (high-dimensional)

UNSTABLE REGION !final weight vector

weight trajectory with robust contstraints

weight trajectory without robust contstraints

Trajectory of Weights and Bounds on Regions of Stability

BB

CC

DD

EE

AAinitial weight

vector

Motivation

Robust control theoryGuarantees stability

Results in less aggressive controllers

Reinforcement learning

Optimizes the performance of a controller

No guarantee of stability while learning

external airT

T

T

T T

T

T fan

inputs

outputs

T AE

CD

E

CD

R

T AR

TA

I

TW

S

CW

H

CV

P

FW

TA

O

FA

mixingbox

filter

return air

discharge air

CB

S

Interface

P E

TW

I

PE

EP

variablefrequency

drive

TW

O

valve

heating coil

water heater

air flow

Experimental HVAC System

),(),(

),(),(

),(),(),(

11

1

011

1

tttt

T

kktkt

ktt

T

kktkt

ktttt

asQasRE

asRasRE

asRasREasQ

Reinforcement Learning

),(),(),(),( 11 tttttttt asQasQasREasQ

Subtract right side from left to get algorithm for updating Q

),(),(),(),( 11 ttttttttt asQasQasRasQ

Replace expectation with sample (Monte Carlo approach)

Temporal-difference Temporal-difference errorerror

T

kktkt

ktt asREasQ

0

),(),(

actionactionstatestate

policy functionpolicy function

value value functiofunctionn

discount factordiscount factorreinforcement (|error|)reinforcement (|error|)

Robust Control based on IQCs

Uncertainties (D)

Contoller/Plant (M)

v w

An Integral Quadratic Constraint (IQC) describes the relationship between signals as

0)(ˆ

)(ˆ)(

)(ˆ

)(ˆ*

dwjww

jwvjw

jww

jwv

Stability of the closed loop system is guaranteed if

II

jwMjw

I

jwM

)()(

)(*

for all w and for e > 0.

Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem.

ReferenceOutput

Good response

Nominal Perturbed

Terrible response

Robust Reinforcement LearningPerturbed case, no learning

Perturbed case, with learning

Through learning, controller has been fine-tuned to actual dynamics of real plant without losing guarantee of stability !

Sum Squared Error

Nominal Controller 0.646

Robust Controller 0.286

Robust RL Controller 0.243

Conclusions• IQC bounds on parameters of tanh and sigmoid

networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability)

• Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems.

• Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability.

• Initial, conservative robust controller becomes more aggressive through adaptation to actual physical system.

• See http://www.cs.colostate.edu/~anderson/res/rl/

Integral Quadratic Constraints

M

1 1 2 2

M1 1

2 2

M

( )

( )

Neural Net and Robust Control with IQCsBounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blueRobust controller and plant in red

First Example

Second Example

Without robust constraints, becomes unstable before learning final stable solution.

Third Example

Distillation Column

Fourth Example

1st Order

2nd Order

)()(

)()()1(

txty

tutxtx

)(01)(

)(1

0)(

9.005.0

05.01)1(

txty

tutxtx

Without robust constraints, becomes unstable before learning final stable solution.

reinforcement learning control with robust stability chuck anderson, matt kretchmar, department of...

Documents

department of electrical

department of computer

final stable solution

reinforcement learning

robust stabilitychuck

fort collins

matt kretchmar

peter young