adaptive critic design for aircraft control silvia ferrari advisor: prof. robert f. stengel...

Adaptive Critic Design for Aircraft Control

Silvia Ferrari

Advisor: Prof. Robert F. Stengel

Princeton University

FAA/NASA Joint University Program on Air Transportation,

MIT, Cambridge, MA

January 18-19, 2001

Table of Contents

• Introduction and motivation

• Optimal control problem

• Dynamic programming formulation

• Adaptive critic designs

• Proportional-Integral Neural Network Controller

• Pre-training of action and critic networks

• On-line training of action and critic networks

• Summary and Conclusions

Introduction

• Gain scheduled linear controllers for nonlinear systems

• Classical/neural synthesis of control systems Prior knowledge Adaptive control and artificial neural networks

• Adaptive critics Learn in real time Cope with noise Cope with many variables Plan over time in a complex way ...

Action network takes immediate control action

Critic network estimates projected cost

Motivation for Neural Network-Based Controller

• Network of networks motivated by linear control structure

• Multiphase learning:

Pre-training

On-line training during piloted simulations or testing

• Improved global control

• Pre-training phase provides:

A global neural network controller

An excellent initialization point for on-line learning

• On-line training accounts for:

Differences between actual and assumed dynamic models

Nonlinear effects not captured in linearizations

Nonlinear Business Jet Aircraft Model

Aircraft Equations of Motion:

tttt

ttttdt

td

upxhy

upxfxx

,,

,,

State vector:

Control vector:

Parameters:

Output vector:

r

m

n

t

t

t

t

y

p

u

x

XB

ZB Thrust

Drag

Lift

V

mgp

q

r

YB

Bolza Type Cost Function Minimization:

dttJft

t

ff 0

,,, uxx L

Value Function Minimization

The minimization of J can be imbedded in the following problem:

dttttVf

f

t

t

fftt

,,,,, uxxux L

Value Function for [t, tf]:

dttttVf

f

t

t

ff

tt

,,,min,* uxxxu

L

V

Time[x(tf), tf]

t0 t tf

J

Time

[x(tf), tf]

t0 t tf

Discretized Optimization Problem

Approximate the equations of motion by a difference equation:

,,,1 kkkk D upxfx

Similarly, the cost function:

1

0

,,,fk

kDff kkkkkJ uxx L

The cost of operation during the last stage is:

1,1,1,1,1 ffDffffkk kkkkkkVff

uxxux L

where: tktkttk ffk ,1,,,0 ,/ ff ktt ,tkk xx

The Principle of Optimality

Similarly, the optimal cost over the last two intervals is given by:

1,12,2min

2

,11,2

,2*

ffkkffDkk

fkk

kkVkk

kV

ffff

ff

uxux

x

uuL

1,1,min11

,1*

ffDffkfkk kkkkkV

fff uxxx

uL

The optimal cost, from to , is then:1fkt fkt

By the principle of optimality,

a

b

cGiven Vab,

if Vabc is optimal from a to c,

then Vbc is optimal from b to c: V*abc = Vab + V*

bc

A Recurrence Relationship of Dynamic Programming

Then, for a k-stage process:

ff

f

f

ffff

ff

knkffDnk

k

nkkDffknknkfknk

Vnknk

kkkknkV

,1*

1

1,,1,,

*

,min

,,min

ux

uxxx

u

uuu

L

L

Backward Dynamic Programming

So, the optimal cost for a 2-stage process can be re-written as:

fff

ff kkffDkfkk VkkkV ,1*

2,2

* 2,2min2

uxxu

L

b

c

e

d

f

Vcf

Vdf

Vef

Vbc

Vbd

Vbe

• Begin at tf

• Move backward to t0

• Store all u and V

• Off-line processa b'

Forward Dynamic Programming

• Estimate cost-to-go function,

• Determine immediate control action

• Move forward in time, updating cost-to-go function

• On-line process

Vbc

ccfV

bf

a

bfV

Vab

Cost associated with going from tk to tf:

1, , kkkkkk tVttUVtVf

xxuxx

Utility Estimated cost for fk ttt 1

Hereon, let t = tk, t + 1 = tk+1, etc...

Adaptive Critic Designs .. at a Glance!

Legend:'H': Heuristic'DP': Dynamic Programming'D': Dual'P': Programming'G': Global'AD': Action Dependent

HDP

uNNA

NNCx V

x

DHP

NNA

NNCx

x

x /V

u

GDHP

NNA

NNCx

x u

x /VV

AD

u

NNC

NNAx

x

ADHDP ADDHP ADGDHP

Proportional-Integral Controller

Closed-loop stability: ,ct xx 0~ ty ,ct uu

yc = desired output, (xc,uc) = set point.

ycx(t)

ys(t)

CI

CF

CB

Hu Hx

+

+

+

+

-

-

-

u(t)AIRCRAFT

ty~ (t)

Omitting 's, for simplicity: ,~

ctst yyy ,,~ ctt uuu

Assuming small perturbations, expand about nominal solution:

ttt

ttt

uHxHy

uGxFx

ux

where: u

uxhuH

x

uxhxH

u

uxfG

x

uxfF

,,

,,

,,

,

The state variable is augmented to include the output error integral, (t),

ttttt a

TTTa uH

Gx0H0Fξxx

ux

~~

and the Proportional-Integral cost function is defined as:

dJ TT

aa

T

a

0

~~~22

1uRuuSxxZx

Linearized Business Jet Aircraft Model

where P is a Riccati matrix.

Furthermore,

From the Euler-Lagrange equations, the optimal* control law is:

Linear Proportional-Integral Control Law Formulation

tt

tt

aIBa

aTT

xCCxC

xMPGRu

1*~

it can be shown that the optimal value function is

tttV a

T

aa xPxx 2

1*

and its partial derivative with respect to xa is:

Pxxx

ttV T

aa

a

*

Where: ,ct xx ,0~ ty ,ct uu cs t yy

Proportional-Integral Neural Network Controller

ycx(t)

˜ y t

+

+

+

+

-

u(t)uc

+

-

uB(t)

uI(t)

xc

ys(t)

e

a

NNI

NNF

NNBSVG

CSG

AIRCRAFT

tx~

tts uxh ,

NNCV/xa(t)

Feedback and Command-Integral (Action)Neural Network Pre-Training

Feedback Requirements (pre-training phase):

(R1)

(R2)

z 0,a k 0

z

px k

uB x

k

CB a k TTTax ppp ~

NNB accounts for regulation, uB = NNB(x, a).

For each operating point, k,

a z output and inputs:

~

Command-Integral Requirements (pre-training phase):

NNI provides dynamic compensation, uI = NNI(, a).


a z output and inputs:(R1)

(R2)

z 0,a k 0

z

p k

u I

k

C I ak TTTappp

Critic Neural Network Pre-Training

Critic Requirements (pre-training phase):

NNC estimates the value function derivatives, = NNC(xa, a).


a z output and training inputs:

TTT

a ax ppp

From the Proportional-Integral optimal value function derivatives:

• 00λPxxx

xλ

ttV

t T

aa

a

a

*

• P

x

tV

a

2

*2

(R1)

(R2)

k

kak

k

V

a

aPxp

z

a0z

x

2

*2

0,

Action and Critic Neural Network On-line Training by Dual Heuristic Programming

The cost-to-go at time t,

The same cost function is optimized in pre-training and in on-line learning

0

~~~22

1

t

TT

aa

T

a ttttttJ uRuuSxxZx

The utility function is defined as:

ttttttttU TT

aa

T

aa uRuuSxxZxux ~~~22

1~,

,1~, tVttUtV aaaa xxuxx

must be minimized w.r.t. , for an estimated cost-to-go at (t + 1). tu~

Optimality Conditions for Dual Heuristic Programming

t

t

t

U

t

U

t

tVt

a

a

aa

aa x

xu

uxx

xxλ

~

~

t

tt

a

aa x

xxλ

11

,

~

~1

1

ca

aaa t

t

t

tt F

x

xu

u

xxλ

0

u

x

t

tV a

~

0Fu

xxλ

u

A

aa t

tt

t

U~

11~

Differentiating both sides of the value function, V[xa(t)], w.r.t. xa(t):

Differentiating w.r.t. the control, :~ tu

Optimality equation:

NNC

Aicraft Model(Discrete Time)

Utility

NNA

AF

xa(t) tu~

t

U

u~

1 taxλ

tta

u

x~

1

axa(t+1)

Action Network On-line Training

Train action network, at time t, holding the critic parameters fixed

[Balakrishnan and Biega, 1996]

NNB

NNI

++

tu~

uB(t)

uI(t)

a

tx~

(t)

Critic Network On-line Training

Train critic network, at time t, holding the action parameters fixed

[Balakrishnan and Biega, 1996]

NNC

Utility

NNA

CF

xa(t) tu~

t

U

ax

1 taxλ

,~

1

t

ta

u

x

axa(t+1)

NNC

t

t

a

a

x

x

1

+

taxλ *

taxλ

Aicraft Model(Discrete Time)

Summary

• Aircraft optimal control problem

• Linear multivariable control structure

• Corresponding nonlinear controller

• Adaptive critic architecture:

Action and critic networks

Algebraic pre-training based on a-priori knowledge

On-line training during simulations (severe conditions)

Conclusions and Future Work

• Nonlinear, adaptive, real-time controller:

Maintain stability and robustness throughout flight envelope

Improve aircraft control performance under extreme conditions

• Systematic approach for designing nonlinear control systems motivated by well-known linear control structures

• Innovative neural network training techniques

Future Work:

• Adaptive critic architecture implementation

• Testing: acrobatic maneuvers, severe operating conditions,

coupling and nonlinear effects

adaptive critic design for aircraft control silvia ferrari advisor: prof. robert f. stengel...

Documents

control vector

optimal cost

pretraining online training

action network

output error integral

optimal value function

dynamic programmingd

dynamic programmingso