gramian based model reduction of large-scale dynamical systems · of large-scale dynamical systems...

Gramian Based Model Reductionof Large-scale Dynamical Systems

Paul Van Dooren

Universite catholique de LouvainCenter for Systems Engineering and Applied Mechanics

B-1348 Louvain-la-Neuve, [email protected]

• Origin and motivation

Ex: Storm surge forecasting

• Typical techniques (Gramians)

Linear time-invariant

Linear time-varying

Non-linear

• Numerical/Algorithmic issues (Krylov)

1

Storm surge forecasting in the North Sea

see Verlaan-Heemink ’97

0

1000

1500

2000

2500

3000

3500

500

20 40 60 80 100 120 140 160 180 200

0

20

40

60

80

100

120

140

160

n depth

Wick

North Shields

m

Dover

Delfzijl

Lowestoft

Huibertgat

Sheerness

Harlingen

Vlissingen

IJmuiden-II

HoekvanHolland

Den Helder

ProblemUsing measurements predict the state of the North Seavariables in order to operate the sluices in due time (6h.)

Solutionx(t)

.= [h(t), vx(t), vy(t)] satisfies the shallow water equations

∂x(t)/∂t = F (x(t), w(t))

y(t) = G(x(t), v(t))

with measurements y(t) and noise processes v(.), w(.)=⇒ estimate and predict x(t) using Kalman filtering

2

Some data : very few measurements (x’s and +’s)

water level current HF−radar station

0 20 40 60 80 100

420

430

440

450

460

470

480

490

500

510

Noordwijk

Scheveningen

Hook of Holland

but discretized state is very large-scale (60.000 variables)

0 20 40 60 80 100

420

430

440

450

460

470

480

490

500

510

x [km]

y [k

m]

3

Nevertheless it works ...

True water levels Estimated water levels

0.1

0.2 0.3

0.4 0.5 0.6 0.7

True water levels (t=1)ZUNOWAK

RIJKSWATERSTAAT RIKZ

ENGLAND

FRANCE BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Estimated water levels (t=1)ZUNOWAK


ENGLAND

BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60

FRANCE

0.2

0.2 0.4

0.6 0.8

1 1.2

1.4



ENGLAND

FRANCE BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60 0

0.2

0.2

0.4 0.6 0.8

1 1.2 1.4



ENGLAND

FRANCE BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60

-0.8 -0.6 -0.4

-0.2 -0.2 -0.2

-0.2

0 0

0

0.2

0.4

0.6



ENGLAND

FRANCE BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60

-0.6 -0.4

-0.2

-0.2

0

0

0

0

0.2

0.4

0.6



ENGLAND

FRANCE BELGIUM

THE NETHERLANDS

GERMANY

0 20 40 60 80

0

20

40

60

Reconstruction works well around estuarium

4

Visualisation of computed variance of the error

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

20 40 60 80 100 120 140 160 180 200

0

20

40

60

80

100

120

140

160

m

n

std water level [m]

Standard deviation of filter using 8 measurement locations

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

20 40 60 80 100 120 140 160 180 200

0

20

40

60

80

100

120

140

160

m

n

std water level [m]

Standard deviation of filter using measurement locations(1 used for validation)

5

Suppose a ”good” discretization x(.) ∈ RN is given

Dynamical systems modeled via explicit equations

x(t) ∈ RN , N >> m, py1(.)←−y2(.)←−...yp(.)←−

←− u1(.)←− u2(.)...←− um(.)

discretize=⇒

continuous-time discrete-time{

x(t) = G(x(t), u(t))

y(t) = H(x(t), u(t))

{

x(k + 1) = G(x(k), u(k))

y(k) = H(x(k), u(k))

⇓ linearize ⇓

{

x(t)=A(t)x(t)+B(t)u(t)

y(t) = C(t)x(t)+D(t)u(t)

{

x(k+1) = A(k)x(k)+B(k)u(k)

y(k) = C(k)x(k)+D(k)u(k)

⇓ freeze time ⇓

{

x(t) = Ax(t) +Bu(t)

y(t) = Cx(t) +Du(t)

{

x(k + 1) = Ax(k) +Bu(k)

y(k) = Cx(k) +Du(k)

Many control problems require ≈ N 3/(∆t) operations.Because of cubic complexity in N ⇒ model reduction

6

Linear Time Invariant Systems

Given ”large model” {ANN , BNm, CpN}

{

x(t) = Ax(t) + Bu(t)

y(t) = Cx(t) +Du(t),

u(t) ∈ Rm, y(t) ∈ Rp, x(t) ∈ RN , N << m, p

find ”small model”{

Ann, Bnm, Cpn

}

with n << N

{

˙x(t) = Ax(t) + Bu(t)

y(t) = Cx(t) + Du(t),

driven by the same input u(t) with small error

‖y(t)− y(t)‖

Model reduction = find a smaller model, i.e. n << N :

• approximation problem

• stability is important

• measure is important

7

How to capture the essence of the system ?

Transfer functions and norms

H(s) = C(sIN −A)−1B +D, H(s) = C(sIn − A)

−1B + D,

are p×m rational matrices

try to match frequency responses

10−1

100

101

102

103

104

105

10−5

10−4

10−3

10−2

10−1

100

101

102

|Fre

quen

cy r

espo

nse|

Full order model10

−110

010

110

210

310

410

510

−5

10−4

10−3

10−2

10−1

100

101

102

Reduced order model

|Fre

quen

cy r

espo

nse|

by minimizing their difference using

‖H(.)− H(.)‖∞.= sup

ωσmax{H(jω)− H(jω)}

Theory :

• balanced truncation (Moore ’81)

• optimal Hankel norm approximation (Adamian-Arov-Krein’71, Glover ’90)

• interpolation (Gragg-Lindquist ’83)

Other references : Gallivan-Grimme-VanDooren, Jaimoukha-Kasanelly,

Villemagne-Skelton, Boley, Craig, Freund-Feldman, Sorensen-Antoulas,

...

8

Why ‖ . ‖∞ norm ?

Fourier transforms of signals :

uf(ω) = Fu(t), yf (ω) = Fy(t), yf(ω) = F y(t)

yields

yf (ω) = H(jω)uf (ω), yf (ω) = H(jω)uf(ω).

and hence a bound for e(t) .= [y(t)− y(t)] :

Fe(t) = ef(ω) = [H(jω)− H(jω)]uf(ω).

Minimize worst case error ‖ef(ω)‖2 for ‖uf(ω)‖2 = 1by minimizing

‖H(.)− H(.)‖∞.= sup

ω‖H(jω)− H(jω)‖2,

but this is a difficult norm to handle !

9

Use Hankel norm instead of ‖.‖∞ approximations

Consider the mapping “past inputs” =⇒ “future outputs”Continuous-time

y(t) =

∫ 0

−∞

CeA(t−τ)Bu(τ )dτ = CeAt ·

∫ ∞

0

eAτBu(−τ )dτ,

y(t) = CeAtx(0), x(0) =

∫ ∞

0

eAτBu(−τ )dτ.

−40 −35 −30 −25 −20 −15 −10 −5 0 5 10−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Continuous input u(t)−10 −5 0 5 10 15 20 25 30 35 40

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Continuous output y(t)

Discrete-time

y(k) =

0∑

−∞

CA(k−j)Bu(j) = CAk ·

∞∑

0

AjBu(−j),

y(k) = CAkx(0), x(0) =

∞∑

0

AjBu(−j).

−40 −35 −30 −25 −20 −15 −10 −5 0 5 10−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Discrete input u(k)−10 −5 0 5 10 15 20 25 30 35 40

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Discrete output y(k)

10

N ×N Gramians derived from the Hankel map

From y([0,∞)) = Ox(0), x(0) = Cu((−∞, 0)), define thedual mapsO∗ : y([0,∞)) 7→ x(0), C∗ : x(0) 7→ u((−∞, 0))and the (observability and controllability) Gramians

Go.= O∗O, Gc

.= CC∗

Continuous-time

Go =

∫ +∞

0

(CeAt)T (CeAt)dt, Gc =

∫ +∞

0

(eAtB)(eAtB)Tdt,

Discrete-time

Go =

+∞∑

0

(CAk)T (CAk), Gc =

+∞∑

0

(AkB)(AkB)T ,

Gramians can be viewed as “energy functions”Gc for past inputs −→ x(0)Go for x(0) −→ future outputs

Perform ordered eigendecompositionΛ = T −1(GcGo)T (withλn >> λn+1) and project on the first n coordinates :

T−1AT.=

[

A11 A12A21 A22

]

, T−1B.=

[

B1B2

]

, CT.=[

C1 C2]

.

{A, B, C}.= {A11, B1, C1}

11

Interpolating the frequency responseH(ω) seems a good ideasince Parseval’s theorem implies

Go =1

2π

∫ +∞

−∞

(−jωI − AT )−1CTC(jωI −A)−1dω,

Gc =1

2π

∫ +∞

−∞

(jωI −A)−1BBT (−jωI −AT )−1dω.

and

Go =1

2π

+∞∑

−∞

(e−jωI −AT )−1CTC(ejωI −A)−1

Gc =1

2π

+∞∑

−∞

(ejωI − A)−1BBT (e−jωI −AT )−1.

What technique to use ? The discrete-time case :

y0y1y2...

= OC

u−1u−2u−3

...

=

C

CA

CA2...

[

B AB A2B . . .]

u−1u−2u−3

...

... suggests to use Krylov spaces !

Kj(M,R) = Im{

R,MR,M 2R, . . . ,M j−1R}

12

Rational interpolation and moment matching

Let X and Y define a projector (Y TX = In) and{

A, B, C,D}

={

Y TAX, Y TB,CX,D}

Taylor series of H(s) .= C(sI − A)−1B +D around∞

H(s) = H0 +H1s−1 +H2s

−2 + · · · ,

where the momentsHi are equal to :

H0 = D, Hi = CAi−1B, i = 1, 2, ...

The reduced order model H(s) .= C(sI − A)−1B + Dhas a similar expansion

H(s) = H0 + H1s−1 + H2s

−2 + · · · ,

with moments Hi :

H0 = D, Hi = CAi−1B, i = 1, 2, ...

Moment matching of both models (Pade approximation) isobtained by using Krylov spaces (Gragg-Lindquist ’83)

13

Theorem : Letm = p, Y TX = In and assume

ImX = Im[

B,AB,A2B, . . . Ak−1B]

,

ImY = Im[

CT , ATCT , A2TCT , . . . A(k−1)TCT]

then the first 2k moments match :

Hj = Hj, j = 1, . . . , 2k.

Rational Krylov methods extend this to several points σi :

multipoint (Pade) approximations are obtained by just usingmodified Krylov spaces : (Gallivan-Grimme-VanDooren ’97)

ImX =⋃

i

Kji{(A− σiI)−1, (A− σiI)

−1B}

ImY =⋃

i

Kji{(AT − σiI)

−1, (AT − σiI)−1CT}

in the above theorem

14

Comparison “Optimal” and rational approximations

15th order approximation of 120 th order CD player

10−1

100

101

102

103

104

105

−250

−200

−150

−100

−50

0

50

100

|Fre

quen

cy r

espo

nses

| in

dB

15th order Hankel norm approximation10

−110

010

110

210

310

410

5−250

−200

−150

−100

−50

0

50

100

|Fre

quen

cy r

espo

nses

| in

dB15th order balanced truncation

10−1

100

101

102

103

104

105

−250

−200

−150

−100

−50

0

50

100

|Fre

quen

cy r

espo

nses

| in

dB

15th order RK with points 1,2,2,2,2,2,2,2,2,3,4,4,5,5,Inf

Legend : · · · Hankel norm -·-· Balanced truncation - - - Rational Krylov

Errors |T (.)− T (.)| ln(|T (.)− T (.)|)

Hankel 0.02 6.1Balanc. 0.04 4.1Rat. Kr. 4.02 1.5

Rational approximation looks better on a logarithmic scale

15

Time-varying systemsApproximate the (discrete) time-varying systems

{

x(k + 1) = A(k)x(k) +B(k)u(k)

y(k) = C(k)x(k) +D(k)u(k)

by a lower order models of same type. We notice that

y(k)

y(k+1)

y(k+2)...

=

C(k)

C(k+1)A(k)

C(k+2)A(k+1)A(k)...

x(k),

x(k) =

[

B(k−1) A(k−1)B(k−2) A(k−1)A(k−2)B(k−3) . . .]

u(k−1)

u(k−2)

u(k−3)...

which again suggests Krylov. Use low rank approximations

Go(k) ≈ So(k)STo (k), Gc(k) ≈ Sc(k)S

Tc (k),

where So(k) and Sc(k) are N × n matrices

Such approximations are obtained e.g. by keeping only the ndominant singular vectors at step k of the Krylov recurrence :

Sc(k) = SV Dn [B(k), A(k)Sc(k−1)]

16

Let’s go back to the Storm Surge example

The important matrix in this Kalman filtering problem is

P (k) = E{(x(k)− x(k))(x(k)− x(k))T}

which represents the “error covariance” of the estimation.

If we approximate

P (k) ≈ S(k)S(k)T , S(k) ∈ RN×n,

then we obtain the recurrence

S(k + 1)⇐ SV Dn

[

R(k) C(k)S(k) 0

0 A(k)S(k) B(k)Q(k)

]

where the new factor S(k+1) stays of rank n by a projection(using the SVD).

The only big matrix involved here is the sparse matrix A(k)which is multiplied with only n columns

Verlaan-Heemink ’97

17

Time-varying linearized problems

Consider the discrete-time system{

x(k + 1) = G(x(k), u(k))

y(k) = H(x(k), u(k)).

One could linearize along a “nominal” trajectory (x(k), u(k))and getA(.), B(.), C(.), D(.) from Taylor expansion ofG(., .), H(., .)

Simpler idea (POD) : (Holmes-Lumley-Berkooz ’96)

Use the “energy function” G .=∑kfk=kix(k)x(k)T .

From

x(k + 1) = A(k)x(k)

with initial conditions x(ki), we have

x(k) = Φ(k, ki)x(ki).

Therefore G looks like a Gramian :kf∑

k=ki

(Φ(k, ki)x(ki))(Φ(k, ki)x(ki))T .

Now project on its dominant subspace (POD)

18

Example : Use POD in CVD reactor (Ly-Tran ’99)

Schematic representation of a horizontalquartz reactor in a steel confinement shell

Compute state trajectories for one “typical” input :

Snap shots of “typical” states Ten dominant “states”

19

Concluding remarks

• large-scale is typically sparse

• time-stepping (simulation) is cheaper than control (opti-mization)

• find an “energy function” that is “cheap” and project onits dominant features

Future work

• find error bounds “on the fly”

• incorporate projections in closed loop

Further reading

P. Van Dooren, Gramian based model reduction of large-scaledynamical systems, in Numerical Analysis 1999, Chapmanand Hall, pp. 231-247, CRC Press, London, 2000.M. Verlaan and A. Heemink, Tidal flow forecasting using re-duced rank square root filters, Stochastic Hydrology and Hy-draulics, 11, pp. 349-368, 1997.

See also SIAM short course notes onhttp://www.auto.ucl.ac.be/˜vdooren/

20

gramian based model reduction of large-scale dynamical systems · of large-scale dynamical systems...

Documents