gramian based model reduction of large-scale dynamical systems · of large-scale dynamical systems...
TRANSCRIPT
Gramian Based Model Reductionof Large-scale Dynamical Systems
Paul Van Dooren
Universite catholique de LouvainCenter for Systems Engineering and Applied Mechanics
B-1348 Louvain-la-Neuve, [email protected]
• Origin and motivation
Ex: Storm surge forecasting
• Typical techniques (Gramians)
Linear time-invariant
Linear time-varying
Non-linear
• Numerical/Algorithmic issues (Krylov)
1
Storm surge forecasting in the North Sea
see Verlaan-Heemink ’97
0
1000
1500
2000
2500
3000
3500
500
20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
160
n depth
Wick
North Shields
m
Dover
Delfzijl
Lowestoft
Huibertgat
Sheerness
Harlingen
Vlissingen
IJmuiden-II
HoekvanHolland
Den Helder
ProblemUsing measurements predict the state of the North Seavariables in order to operate the sluices in due time (6h.)
Solutionx(t)
.= [h(t), vx(t), vy(t)] satisfies the shallow water equations
∂x(t)/∂t = F (x(t), w(t))
y(t) = G(x(t), v(t))
with measurements y(t) and noise processes v(.), w(.)=⇒ estimate and predict x(t) using Kalman filtering
2
Some data : very few measurements (x’s and +’s)
water level current HF−radar station
0 20 40 60 80 100
420
430
440
450
460
470
480
490
500
510
Noordwijk
Scheveningen
Hook of Holland
but discretized state is very large-scale (60.000 variables)
0 20 40 60 80 100
420
430
440
450
460
470
480
490
500
510
x [km]
y [k
m]
3
Nevertheless it works ...
True water levels Estimated water levels
0.1
0.2 0.3
0.4 0.5 0.6 0.7
True water levels (t=1)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
FRANCE BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Estimated water levels (t=1)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60
FRANCE
0.2
0.2 0.4
0.6 0.8
1 1.2
1.4
True water levels (t=3)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
FRANCE BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60 0
0.2
0.2
0.4 0.6 0.8
1 1.2 1.4
Estimated water levels (t=3)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
FRANCE BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60
-0.8 -0.6 -0.4
-0.2 -0.2 -0.2
-0.2
0 0
0
0.2
0.4
0.6
True water levels (t=8)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
FRANCE BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60
-0.6 -0.4
-0.2
-0.2
0
0
0
0
0.2
0.4
0.6
Estimated water levels (t=8)ZUNOWAK
RIJKSWATERSTAAT RIKZ
ENGLAND
FRANCE BELGIUM
THE NETHERLANDS
GERMANY
0 20 40 60 80
0
20
40
60
Reconstruction works well around estuarium
4
Visualisation of computed variance of the error
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
160
m
n
std water level [m]
Standard deviation of filter using 8 measurement locations
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
160
m
n
std water level [m]
Standard deviation of filter using measurement locations(1 used for validation)
5
Suppose a ”good” discretization x(.) ∈ RN is given
Dynamical systems modeled via explicit equations
x(t) ∈ RN , N >> m, py1(.)←−y2(.)←−...yp(.)←−
←− u1(.)←− u2(.)...←− um(.)
discretize=⇒
continuous-time discrete-time{
x(t) = G(x(t), u(t))
y(t) = H(x(t), u(t))
{
x(k + 1) = G(x(k), u(k))
y(k) = H(x(k), u(k))
⇓ linearize ⇓
{
x(t)=A(t)x(t)+B(t)u(t)
y(t) = C(t)x(t)+D(t)u(t)
{
x(k+1) = A(k)x(k)+B(k)u(k)
y(k) = C(k)x(k)+D(k)u(k)
⇓ freeze time ⇓
{
x(t) = Ax(t) +Bu(t)
y(t) = Cx(t) +Du(t)
{
x(k + 1) = Ax(k) +Bu(k)
y(k) = Cx(k) +Du(k)
Many control problems require ≈ N 3/(∆t) operations.Because of cubic complexity in N ⇒ model reduction
6
Linear Time Invariant Systems
Given ”large model” {ANN , BNm, CpN}
{
x(t) = Ax(t) + Bu(t)
y(t) = Cx(t) +Du(t),
u(t) ∈ Rm, y(t) ∈ Rp, x(t) ∈ RN , N << m, p
find ”small model”{
Ann, Bnm, Cpn
}
with n << N
{
˙x(t) = Ax(t) + Bu(t)
y(t) = Cx(t) + Du(t),
driven by the same input u(t) with small error
‖y(t)− y(t)‖
Model reduction = find a smaller model, i.e. n << N :
• approximation problem
• stability is important
• measure is important
7
How to capture the essence of the system ?
Transfer functions and norms
H(s) = C(sIN −A)−1B +D, H(s) = C(sIn − A)
−1B + D,
are p×m rational matrices
try to match frequency responses
10−1
100
101
102
103
104
105
10−5
10−4
10−3
10−2
10−1
100
101
102
|Fre
quen
cy r
espo
nse|
Full order model10
−110
010
110
210
310
410
510
−5
10−4
10−3
10−2
10−1
100
101
102
Reduced order model
|Fre
quen
cy r
espo
nse|
by minimizing their difference using
‖H(.)− H(.)‖∞.= sup
ωσmax{H(jω)− H(jω)}
Theory :
• balanced truncation (Moore ’81)
• optimal Hankel norm approximation (Adamian-Arov-Krein’71, Glover ’90)
• interpolation (Gragg-Lindquist ’83)
Other references : Gallivan-Grimme-VanDooren, Jaimoukha-Kasanelly,
Villemagne-Skelton, Boley, Craig, Freund-Feldman, Sorensen-Antoulas,
...
8
Why ‖ . ‖∞ norm ?
Fourier transforms of signals :
uf(ω) = Fu(t), yf (ω) = Fy(t), yf(ω) = F y(t)
yields
yf (ω) = H(jω)uf (ω), yf (ω) = H(jω)uf(ω).
and hence a bound for e(t) .= [y(t)− y(t)] :
Fe(t) = ef(ω) = [H(jω)− H(jω)]uf(ω).
Minimize worst case error ‖ef(ω)‖2 for ‖uf(ω)‖2 = 1by minimizing
‖H(.)− H(.)‖∞.= sup
ω‖H(jω)− H(jω)‖2,
but this is a difficult norm to handle !
9
Use Hankel norm instead of ‖.‖∞ approximations
Consider the mapping “past inputs” =⇒ “future outputs”Continuous-time
y(t) =
∫ 0
−∞
CeA(t−τ)Bu(τ )dτ = CeAt ·
∫ ∞
0
eAτBu(−τ )dτ,
y(t) = CeAtx(0), x(0) =
∫ ∞
0
eAτBu(−τ )dτ.
−40 −35 −30 −25 −20 −15 −10 −5 0 5 10−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Continuous input u(t)−10 −5 0 5 10 15 20 25 30 35 40
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Continuous output y(t)
Discrete-time
y(k) =
0∑
−∞
CA(k−j)Bu(j) = CAk ·
∞∑
0
AjBu(−j),
y(k) = CAkx(0), x(0) =
∞∑
0
AjBu(−j).
−40 −35 −30 −25 −20 −15 −10 −5 0 5 10−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Discrete input u(k)−10 −5 0 5 10 15 20 25 30 35 40
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Discrete output y(k)
10
N ×N Gramians derived from the Hankel map
From y([0,∞)) = Ox(0), x(0) = Cu((−∞, 0)), define thedual mapsO∗ : y([0,∞)) 7→ x(0), C∗ : x(0) 7→ u((−∞, 0))and the (observability and controllability) Gramians
Go.= O∗O, Gc
.= CC∗
Continuous-time
Go =
∫ +∞
0
(CeAt)T (CeAt)dt, Gc =
∫ +∞
0
(eAtB)(eAtB)Tdt,
Discrete-time
Go =
+∞∑
0
(CAk)T (CAk), Gc =
+∞∑
0
(AkB)(AkB)T ,
Gramians can be viewed as “energy functions”Gc for past inputs −→ x(0)Go for x(0) −→ future outputs
Perform ordered eigendecompositionΛ = T −1(GcGo)T (withλn >> λn+1) and project on the first n coordinates :
T−1AT.=
[
A11 A12A21 A22
]
, T−1B.=
[
B1B2
]
, CT.=[
C1 C2]
.
{A, B, C}.= {A11, B1, C1}
11
Interpolating the frequency responseH(ω) seems a good ideasince Parseval’s theorem implies
Go =1
2π
∫ +∞
−∞
(−jωI − AT )−1CTC(jωI −A)−1dω,
Gc =1
2π
∫ +∞
−∞
(jωI −A)−1BBT (−jωI −AT )−1dω.
and
Go =1
2π
+∞∑
−∞
(e−jωI −AT )−1CTC(ejωI −A)−1
Gc =1
2π
+∞∑
−∞
(ejωI − A)−1BBT (e−jωI −AT )−1.
What technique to use ? The discrete-time case :
y0y1y2...
= OC
u−1u−2u−3
...
=
C
CA
CA2...
[
B AB A2B . . .]
u−1u−2u−3
...
... suggests to use Krylov spaces !
Kj(M,R) = Im{
R,MR,M 2R, . . . ,M j−1R}
12
Rational interpolation and moment matching
Let X and Y define a projector (Y TX = In) and{
A, B, C,D}
={
Y TAX, Y TB,CX,D}
Taylor series of H(s) .= C(sI − A)−1B +D around∞
H(s) = H0 +H1s−1 +H2s
−2 + · · · ,
where the momentsHi are equal to :
H0 = D, Hi = CAi−1B, i = 1, 2, ...
The reduced order model H(s) .= C(sI − A)−1B + Dhas a similar expansion
H(s) = H0 + H1s−1 + H2s
−2 + · · · ,
with moments Hi :
H0 = D, Hi = CAi−1B, i = 1, 2, ...
Moment matching of both models (Pade approximation) isobtained by using Krylov spaces (Gragg-Lindquist ’83)
13
Theorem : Letm = p, Y TX = In and assume
ImX = Im[
B,AB,A2B, . . . Ak−1B]
,
ImY = Im[
CT , ATCT , A2TCT , . . . A(k−1)TCT]
then the first 2k moments match :
Hj = Hj, j = 1, . . . , 2k.
Rational Krylov methods extend this to several points σi :
multipoint (Pade) approximations are obtained by just usingmodified Krylov spaces : (Gallivan-Grimme-VanDooren ’97)
ImX =⋃
i
Kji{(A− σiI)−1, (A− σiI)
−1B}
ImY =⋃
i
Kji{(AT − σiI)
−1, (AT − σiI)−1CT}
in the above theorem
14
Comparison “Optimal” and rational approximations
15th order approximation of 120 th order CD player
10−1
100
101
102
103
104
105
−250
−200
−150
−100
−50
0
50
100
|Fre
quen
cy r
espo
nses
| in
dB
15th order Hankel norm approximation10
−110
010
110
210
310
410
5−250
−200
−150
−100
−50
0
50
100
|Fre
quen
cy r
espo
nses
| in
dB15th order balanced truncation
10−1
100
101
102
103
104
105
−250
−200
−150
−100
−50
0
50
100
|Fre
quen
cy r
espo
nses
| in
dB
15th order RK with points 1,2,2,2,2,2,2,2,2,3,4,4,5,5,Inf
Legend : · · · Hankel norm -·-· Balanced truncation - - - Rational Krylov
Errors |T (.)− T (.)| ln(|T (.)− T (.)|)
Hankel 0.02 6.1Balanc. 0.04 4.1Rat. Kr. 4.02 1.5
Rational approximation looks better on a logarithmic scale
15
Time-varying systemsApproximate the (discrete) time-varying systems
{
x(k + 1) = A(k)x(k) +B(k)u(k)
y(k) = C(k)x(k) +D(k)u(k)
by a lower order models of same type. We notice that
y(k)
y(k+1)
y(k+2)...
=
C(k)
C(k+1)A(k)
C(k+2)A(k+1)A(k)...
x(k),
x(k) =
[
B(k−1) A(k−1)B(k−2) A(k−1)A(k−2)B(k−3) . . .]
u(k−1)
u(k−2)
u(k−3)...
which again suggests Krylov. Use low rank approximations
Go(k) ≈ So(k)STo (k), Gc(k) ≈ Sc(k)S
Tc (k),
where So(k) and Sc(k) are N × n matrices
Such approximations are obtained e.g. by keeping only the ndominant singular vectors at step k of the Krylov recurrence :
Sc(k) = SV Dn [B(k), A(k)Sc(k−1)]
16
Let’s go back to the Storm Surge example
The important matrix in this Kalman filtering problem is
P (k) = E{(x(k)− x(k))(x(k)− x(k))T}
which represents the “error covariance” of the estimation.
If we approximate
P (k) ≈ S(k)S(k)T , S(k) ∈ RN×n,
then we obtain the recurrence
S(k + 1)⇐ SV Dn
[
R(k) C(k)S(k) 0
0 A(k)S(k) B(k)Q(k)
]
where the new factor S(k+1) stays of rank n by a projection(using the SVD).
The only big matrix involved here is the sparse matrix A(k)which is multiplied with only n columns
Verlaan-Heemink ’97
17
Time-varying linearized problems
Consider the discrete-time system{
x(k + 1) = G(x(k), u(k))
y(k) = H(x(k), u(k)).
One could linearize along a “nominal” trajectory (x(k), u(k))and getA(.), B(.), C(.), D(.) from Taylor expansion ofG(., .), H(., .)
Simpler idea (POD) : (Holmes-Lumley-Berkooz ’96)
Use the “energy function” G .=∑kfk=kix(k)x(k)T .
From
x(k + 1) = A(k)x(k)
with initial conditions x(ki), we have
x(k) = Φ(k, ki)x(ki).
Therefore G looks like a Gramian :kf∑
k=ki
(Φ(k, ki)x(ki))(Φ(k, ki)x(ki))T .
Now project on its dominant subspace (POD)
18
Example : Use POD in CVD reactor (Ly-Tran ’99)
Schematic representation of a horizontalquartz reactor in a steel confinement shell
Compute state trajectories for one “typical” input :
Snap shots of “typical” states Ten dominant “states”
19
Concluding remarks
• large-scale is typically sparse
• time-stepping (simulation) is cheaper than control (opti-mization)
• find an “energy function” that is “cheap” and project onits dominant features
Future work
• find error bounds “on the fly”
• incorporate projections in closed loop
Further reading
P. Van Dooren, Gramian based model reduction of large-scaledynamical systems, in Numerical Analysis 1999, Chapmanand Hall, pp. 231-247, CRC Press, London, 2000.M. Verlaan and A. Heemink, Tidal flow forecasting using re-duced rank square root filters, Stochastic Hydrology and Hy-draulics, 11, pp. 349-368, 1997.
See also SIAM short course notes onhttp://www.auto.ucl.ac.be/˜vdooren/
20