adaptive critic design for aircraft control silvia ferrari advisor: prof. robert f. stengel...
TRANSCRIPT
Adaptive Critic Design for Aircraft Control
Silvia Ferrari
Advisor: Prof. Robert F. Stengel
Princeton University
FAA/NASA Joint University Program on Air Transportation,
MIT, Cambridge, MA
January 18-19, 2001
Table of Contents
• Introduction and motivation
• Optimal control problem
• Dynamic programming formulation
• Adaptive critic designs
• Proportional-Integral Neural Network Controller
• Pre-training of action and critic networks
• On-line training of action and critic networks
• Summary and Conclusions
Introduction
• Gain scheduled linear controllers for nonlinear systems
• Classical/neural synthesis of control systems Prior knowledge Adaptive control and artificial neural networks
• Adaptive critics Learn in real time Cope with noise Cope with many variables Plan over time in a complex way ...
Action network takes immediate control action
Critic network estimates projected cost
Motivation for Neural Network-Based Controller
• Network of networks motivated by linear control structure
• Multiphase learning:
Pre-training
On-line training during piloted simulations or testing
• Improved global control
• Pre-training phase provides:
A global neural network controller
An excellent initialization point for on-line learning
• On-line training accounts for:
Differences between actual and assumed dynamic models
Nonlinear effects not captured in linearizations
Nonlinear Business Jet Aircraft Model
Aircraft Equations of Motion:
tttt
ttttdt
td
upxhy
upxfxx
,,
,,
State vector:
Control vector:
Parameters:
Output vector:
r
m
n
t
t
t
t
y
p
u
x
XB
ZB Thrust
Drag
Lift
V
mgp
q
r
YB
Bolza Type Cost Function Minimization:
dttJft
t
ff 0
,,, uxx L
Value Function Minimization
The minimization of J can be imbedded in the following problem:
dttttVf
f
t
t
fftt
,,,,, uxxux L
Value Function for [t, tf]:
dttttVf
f
t
t
ff
tt
,,,min,* uxxxu
L
V
Time[x(tf), tf]
t0 t tf
J
Time
[x(tf), tf]
t0 t tf
Discretized Optimization Problem
Approximate the equations of motion by a difference equation:
,,,1 kkkk D upxfx
Similarly, the cost function:
1
0
,,,fk
kDff kkkkkJ uxx L
The cost of operation during the last stage is:
1,1,1,1,1 ffDffffkk kkkkkkVff
uxxux L
where: tktkttk ffk ,1,,,0 ,/ ff ktt ,tkk xx
The Principle of Optimality
Similarly, the optimal cost over the last two intervals is given by:
1,12,2min
2
,11,2
,2*
ffkkffDkk
fkk
kkVkk
kV
ffff
ff
uxux
x
uuL
1,1,min11
,1*
ffDffkfkk kkkkkV
fff uxxx
uL
The optimal cost, from to , is then:1fkt fkt
By the principle of optimality,
a
b
cGiven Vab,
if Vabc is optimal from a to c,
then Vbc is optimal from b to c: V*abc = Vab + V*
bc
A Recurrence Relationship of Dynamic Programming
Then, for a k-stage process:
ff
f
f
ffff
ff
knkffDnk
k
nkkDffknknkfknk
Vnknk
kkkknkV
,1*
1
1,,1,,
*
,min
,,min
ux
uxxx
u
uuu
L
L
Backward Dynamic Programming
So, the optimal cost for a 2-stage process can be re-written as:
fff
ff kkffDkfkk VkkkV ,1*
2,2
* 2,2min2
uxxu
L
b
c
e
d
f
Vcf
Vdf
Vef
Vbc
Vbd
Vbe
• Begin at tf
• Move backward to t0
• Store all u and V
• Off-line processa b'
Forward Dynamic Programming
• Estimate cost-to-go function,
• Determine immediate control action
• Move forward in time, updating cost-to-go function
• On-line process
Vbc
ccfV
bf
a
bfV
Vab
Cost associated with going from tk to tf:
1, , kkkkkk tVttUVtVf
xxuxx
Utility Estimated cost for fk ttt 1
Hereon, let t = tk, t + 1 = tk+1, etc...
Adaptive Critic Designs .. at a Glance!
Legend:'H': Heuristic'DP': Dynamic Programming'D': Dual'P': Programming'G': Global'AD': Action Dependent
HDP
uNNA
NNCx V
x
DHP
NNA
NNCx
x
x /V
u
GDHP
NNA
NNCx
x u
x /VV
AD
u
NNC
NNAx
x
ADHDP ADDHP ADGDHP
Proportional-Integral Controller
Closed-loop stability: ,ct xx 0~ ty ,ct uu
yc = desired output, (xc,uc) = set point.
ycx(t)
ys(t)
CI
CF
CB
Hu Hx
+
+
+
+
-
-
-
u(t)AIRCRAFT
ty~ (t)
Omitting 's, for simplicity: ,~
ctst yyy ,,~ ctt uuu
Assuming small perturbations, expand about nominal solution:
ttt
ttt
uHxHy
uGxFx
ux
where: u
uxhuH
x
uxhxH
u
uxfG
x
uxfF
,,
,,
,,
,
The state variable is augmented to include the output error integral, (t),
ttttt a
TTTa uH
Gx0H0Fξxx
ux
~~
and the Proportional-Integral cost function is defined as:
dJ TT
aa
T
a
0
~~~22
1uRuuSxxZx
Linearized Business Jet Aircraft Model
where P is a Riccati matrix.
Furthermore,
From the Euler-Lagrange equations, the optimal* control law is:
Linear Proportional-Integral Control Law Formulation
tt
tt
aIBa
aTT
xCCxC
xMPGRu
1*~
it can be shown that the optimal value function is
tttV a
T
aa xPxx 2
1*
and its partial derivative with respect to xa is:
Pxxx
ttV T
aa
a
*
Where: ,ct xx ,0~ ty ,ct uu cs t yy
Proportional-Integral Neural Network Controller
ycx(t)
˜ y t
+
+
+
+
-
u(t)uc
+
-
uB(t)
uI(t)
xc
ys(t)
e
a
NNI
NNF
NNBSVG
CSG
AIRCRAFT
tx~
tts uxh ,
NNCV/xa(t)
Feedback and Command-Integral (Action)Neural Network Pre-Training
Feedback Requirements (pre-training phase):
(R1)
(R2)
z 0,a k 0
z
px k
uB x
k
CB a k TTTax ppp ~
NNB accounts for regulation, uB = NNB(x, a).
For each operating point, k,
a z output and inputs:
~
Command-Integral Requirements (pre-training phase):
NNI provides dynamic compensation, uI = NNI(, a).
For each operating point, k,
a z output and inputs:(R1)
(R2)
z 0,a k 0
z
p k
u I
k
C I ak TTTappp
Critic Neural Network Pre-Training
Critic Requirements (pre-training phase):
NNC estimates the value function derivatives, = NNC(xa, a).
For each operating point, k,
a z output and training inputs:
TTT
a ax ppp
From the Proportional-Integral optimal value function derivatives:
• 00λPxxx
xλ
ttV
t T
aa
a
a
*
• P
x
tV
a
2
*2
(R1)
(R2)
k
kak
k
V
a
aPxp
z
a0z
x
2
*2
0,
Action and Critic Neural Network On-line Training by Dual Heuristic Programming
The cost-to-go at time t,
The same cost function is optimized in pre-training and in on-line learning
0
~~~22
1
t
TT
aa
T
a ttttttJ uRuuSxxZx
The utility function is defined as:
ttttttttU TT
aa
T
aa uRuuSxxZxux ~~~22
1~,
,1~, tVttUtV aaaa xxuxx
must be minimized w.r.t. , for an estimated cost-to-go at (t + 1). tu~
Optimality Conditions for Dual Heuristic Programming
t
t
t
U
t
U
t
tVt
a
a
aa
aa x
xu
uxx
xxλ
~
~
t
tt
a
aa x
xxλ
11
,
~
~1
1
ca
aaa t
t
t
tt F
x
xu
u
xxλ
0
u
x
t
tV a
~
0Fu
xxλ
u
A
aa t
tt
t
U~
11~
Differentiating both sides of the value function, V[xa(t)], w.r.t. xa(t):
Differentiating w.r.t. the control, :~ tu
Optimality equation:
NNC
Aicraft Model(Discrete Time)
Utility
NNA
AF
xa(t) tu~
t
U
u~
1 taxλ
tta
u
x~
1
axa(t+1)
Action Network On-line Training
Train action network, at time t, holding the critic parameters fixed
[Balakrishnan and Biega, 1996]
NNB
NNI
++
tu~
uB(t)
uI(t)
a
tx~
(t)
Critic Network On-line Training
Train critic network, at time t, holding the action parameters fixed
[Balakrishnan and Biega, 1996]
NNC
Utility
NNA
CF
xa(t) tu~
t
U
ax
1 taxλ
,~
1
t
ta
u
x
axa(t+1)
NNC
t
t
a
a
x
x
1
+
taxλ *
taxλ
Aicraft Model(Discrete Time)
Summary
• Aircraft optimal control problem
• Linear multivariable control structure
• Corresponding nonlinear controller
• Adaptive critic architecture:
Action and critic networks
Algebraic pre-training based on a-priori knowledge
On-line training during simulations (severe conditions)
Conclusions and Future Work
• Nonlinear, adaptive, real-time controller:
Maintain stability and robustness throughout flight envelope
Improve aircraft control performance under extreme conditions
• Systematic approach for designing nonlinear control systems motivated by well-known linear control structures
• Innovative neural network training techniques
Future Work:
• Adaptive critic architecture implementation
• Testing: acrobatic maneuvers, severe operating conditions,
coupling and nonlinear effects