data assimilation for high-dimensional systems...data assimilation for high-dimensional systems...

Data Assimilation for High-DimensionalSystems

− Challenges, algorithms and opportunities

Wei KangU.S. Naval Postgraduate School

2018 IMA Workshop

Collaboration

• Liang Xu (NRL)

• Sarah King (NRL)

• Kai Sun (UTK)

• Junjian Qi (UCF)

• Isaac Kaminer (NPS)

• Qi Gong (UCSC)

• Lucas Wilcox (NPS)

• Arthur J. Krener (NPS)

• Randy Boucher (Army)

• Data assimilation

• Numerical weatherprediction

• Partial observability

• Power systems

• Swarms of autonomousvehicles

• Numerical algorithms

High-Dimensional Systems

Numerical Weather Prediction- ECMWF Global Model

ECMWF Global Model: 8× 107

model variables are updated ev-ery 12 hours using 1.3×107 ob-servations

Power System- Eastern Interconnection

EI: A largest electrical gridin North America. A simpli-fied model has more than 25Kbuses, 28K lines, 8K transform-ers, 1, 000 generators.

High-Dimensional Systems

ChallengesThe scalability of algorithms is limited by several factors: computationalload, I/O overhead and required memory size, degree of parallelism, andpower consumption.

Covariance Matrixdimension vs RAM size

102 103 104 105 106 107 108

Dimension

10-5

100

103

106

108

Me

mo

ry (

GB

)

GB

TB

PB

Quantitative ChangeBecomes Qualitative

Data Assimilation

Liang Xu - NRL

Numerical Weather Prediction

Observation(y)

Background(xb)

DATA ASSIMILATION

Analysis(xa)

FORECASTMODEL Forecast

Data Assimilation

System Model

xk+1 = M(xk ) + ηk , xk ∈ Rn, ηk ∼ model uncertainty,Qyk = H(xk ) + δk , yk ∈ Rm, δk ∼ sensor noise,R

Linearizationxk+1 = Mkxk + · · ·yk = Hkxk + · · ·

Both n and m are large. In daily operations, only a small part of sensordata is used.

4D-Var

Cost function

J (x0) =1

2(x0−xb)T (Pb

0 )−1(x0−xb)+1

2

N∑i=0

(yi−H(xi ))TR−1(yi−H(xi ))

4D-Var estimation{min

xa0

J (xa0 )

xak =M(xa

k−1), k = 1, 2, · · · ,N

The trajecotry, xa0 , x

a1 , · · · , xa

N , is called an analysis; xb is the initialestimate, or background; Pb

0 is a positive definite matrix (fixed).

4D-Var

A linear solution

xak = xb

k + gk , 0 ≤ k ≤ N

g = PHT(HPHT + R

)−1(y − Hxb)

g =

g0...gN

,P = (Pij )Ni ,j=0,H = diag(H0, · · · ,HN),R, y , xb, · · ·

Or define

z =(HPHT + R

)−1(y − Hxb)

Thenxa

k = xbk + gk , 0 ≤ k ≤ N

g = PHT zHPHT z + Rz = (y − Hxb)

4D-Var

A 4D-Var algorithm

fN+1 = 0for k = N : 0

fk = MTk fk+1 + HT zk ,

g0 = Pb0 f0

for k = 0 : Ngk+1 = Mkgk ,

• Mkgk and MTk fk are computed using a tangent linear model and

an adjoint model.

• The equation HPHT + Rzk = (y − Hxb) is solved using conjugategradient algorithm.

4D-Var

• 4D-Var has been widely used in today’s numerical weatherprediction (NWP). It was adopted shortly after the introduction of3D-Var.

• Historically 3D-Var and 4D-Var are inspired by optimal control inwhich a cost function is minimized (Sasaki (1958), Lorenc (1986)).

• It is an effective method to provide estimation results with anaffordable computational load.

• The method does not provide information about error covariance.

• It requires the development and maintenance of tangent linearmodels and adjoint models.

EnKF

• EnKF does not require tangent linear model and adjoint model

• It contains partial information about error statistics

• Undersampling and rank deficiency

• Filter divergence

• Inbreeding - systematically underestimate the analysis errorcovariance

• Spurious correlations - covariance between state components thatare not physically related

EnKF

Localization - components physically far a way are uncorrelated.

Set Pij = 0 for i and j far awaySuppose y is a part of state variables at i1, · · · , im. Let ρ be a sparsematrix that defines the sparsity of P. Define ρxy ∈ Rn×m andρyy ∈ Rm×m

Pxyst = ρis it ,P

yyst = ρis it

Then

Kloc = ρxy . ∗ (∆X∆Y T )(ρyy . ∗ (∆Y∆Y T ) + R

)−1Inflation - rescale ∆X and ∆Y by a small factor.

Sparsity-based Filters

Sparsity-based filters: The goal is to avoid rank deficiency, providemore error covariance information, and achieve granularity control foroptimal parallelism.

A variety of parallel computing architectures are available; and newtechnologies are being developed rapidly.

• Multi-core CPU

• General-purpose GPU

• Clusters or massively parallel computing

• Grid computing

• Application-specific integrated curcuits

• ......


Sparsity based methods

• Approximately sparse error covariance

Nsp = maximum number of nonzero entries in columns

Ii (P) = indices of nonzero entries in the ith-column

• Component-based numerical model

M(x spk ; I) or Mcomp

I = indices of entries to be evaluated


A progressive approach

AssumeMkPkM

Tk = Pk + ∆Pk+1

To estimate ∆Pk+1, assume

Mk+1 = I + ∆Mk

xk+1 =M(xk ) = xk + ∆(xk )

ThenMkPkM

Tk = (I + ∆Mk )Pk (I + ∆MT

k )

= Pk + ∆MkPk + (∆MkPk )T + · · ·≈ (M(xk + δPk )−M(xk )) /δ

+ (M(xk + δPk )−M(xk ))T /δ − Pk


Computational loadProgressive KF number of modelFull model components evaluation

M(xk )M (xk + δPk (:, i)) (n + 1)nNp

i = 1, 2, · · · , n Np- progressive steps

Progressive KFComponent-based model

M(xk )M (xk + δPk (:, i), Ii (P)) (Nsp + 1)nNp

i = 1, 2, · · · , nEnsemble KF

M(x ik )

i = 1, 2, · · · ,Nens Nensn


Avoid rank deficiency and achieve granularity control

Ensemble

XXXXXXXXXXXXXXXXXXXX





Sparse Covariance

X X XX X X XX X X X X0 0 0 0 X X0 0 0 0 0 X X0 0 0 0 0 0 X X0 0 0 0 0 0 0 X X0 0 0 0 0 0 0 0 X X0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X

ØP has full rankØNsp is a variableØTasks can be grouped in different size:

Coarse-grained, medium-grained, fine-grainedØP localization is straightforward

Rank:Nens

Size: n x Nens


Lorenz-96 model

dxi

dt= (xi+1 − xi−2)xi−1 − xi + F , i = 1, 2, · · · ,m

xm+1 = x1

Discretization - 4th-order RK

xk =M(xk−1)

∆t = 0.025

F = 8

m = 400 5 10 15 20 25

t

-6

-4

-2

0

2

4

6

8

10

12

x

Lorenz-96 Chaotic Trajectory


A comparisonN = 1000 initial states in [−1 1] - uniform distribution.Nfilter = 4000 filter stepsm = 20 measurement locationsR = I

Filter Size CMPT

EVAL

EnKF Nens = 10 400

P-KF Nsp = 7

Np = 1 320

P-KF Nsp = 11

Np = 2 480x2

P-KF Nsp = 11

Np = 3 480x3 EnKF PKF PKF PKF UKF

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

RM

SE

EnKF vs Progressive-KF


Sparsity of square root matrix

Theorem (S. Toledo). If P is a symmetric positive definite matrix. Theamount of storage for a Cholesky decomposition of P is O(n + 2η(P)),where η(P) is the number of nonzero entries in P.

Assumption: The sparsity patterns of P and√P are known - I (P),

I (√P).


Sparse UKF (Cont.)

Analysis Pyy =2n∑

i=0

wi ∆y ik+1(∆y i

k+1)T , ∆y ik+1 = y i

k+1|k − yk+1

Pxy =2n∑

i=0

wi ∆x ik+1(∆y i

k+1)T

KPyy = Pxy

xk+1|k+1 = xk+1 + K (yk+1 − yk+1)

Update Pspk+1|k+1 = Psp

k+1|k −(K (Pxy )T

)sp

σi , Ii ∼√

(n + κ)Pspk+1|k+1, i = 1, 2, · · · , n


A comparisonN = 1000 initial states in [−1 1] - uniform distribution.Nfilter = 4000 filter stepsm = 20 measurement locationsR = I

Filter Size CMPT

EVAL

EnKF Nens = 10 400

S-UKF Nsp = 7 640

S-UKF Nsp = 11 960

EnKF S-UKF S-UKF UKF

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

RM

SE

EnKF vs Sparse-UKF

Summary

Progressive KF

• Full rank covariance

• Sparse covariance matrix reduces I/O load and memory size.

• Component-based model reduces computational load.

• Requirement: Pk+1 ≈ Pk + MPk + PkMT

Sparse UKF

• Full rank covariance

• Sparse covariance matrix reduces I/O load and memory size.

• Component-based model reduces computational load.

• Requirement: Cholesky decomposition (additional computationand memory).

Partial Observability

Partial Observability - A quantitative definition

– x - space variable, λ - sensor location, u(x , t) - state trajectory

– y(t) = h(u(x , t), λ) - system output (sensor data)

– Background variation (data assimilation cost function)

J(u, δu, λ) = δuTP1δu + ||y(·, λ; u + δu)− y(·, λ; u)||P2

Definition. Let ε > 0 be a positive number. The observabilityambiguity, ρ, is defined as

ρ2 = maxδu||P(δu)||W

subject to: ||J(uB , δu, λ)|| ≤ εsystem model of u(x , t)

P(δu) ∈W (subspace for estimation)

The ratio, ρ/ε, is called the unobservability index.

Partial Observability

Empirical Gramian

• Let {wi} be an orthonormal basis in W . Let ∆y(t) be thevariation of the output. The Gramian is defined as follows

G = 〈wi ,wj〉P1 + 〈∆yi (t),∆yj (t)〉P2 .

• Let σmin be the smallest eigenvalue of G then 1/√σmin is a first

order approximation of the unobservability index ρ/ε.

• The large scale and high dimensions in NWP make it less desirableor even impossible to make the entire state space observable. Afinite number of modes is enough to provide an accurateapproximation.

Shallow Water Equations (SWEs)

Simplest atmospheric flow model

∂u

∂t+ u

∂u

∂x+∂φ

∂x+ g

∂P

∂x= 0

∂φ

∂t+ u

∂φ

∂x+ φ

∂u

∂x= 0

– x- horizontal distance, u-horizontal velocity, p-fluid depth

– g -gravitational constant, P(x)-height of the obstacle

– φ = gp-geopotential.

0 1 2 3 4 5 6 7 8 9 10

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 100

1

2

3

0.198

0.1985

0.199

0.1995

0.2

0.2005

0.201

0.2015

0.202

t

x

Initial conditions Solution

Algorithm for Optimal Sensor Placement

Optimal sensor location - maximizing the unobservability index

maxσmin(λ)

subject to: λlower < λ < λupper

where σmin is the smallest eigenvalue of G .

1 Initialize λ0, P1, P2

2 Approximate the current solution to the maximization problem bycomputing the smallest eigenvalue σ of the gramian matrix usingthe TLM.

3 Update λ using BFGS step

mk (λ) = σ(λk ) +

(dσ

dλ k

)T

(λ− λk ) +1

2(λ− λk )TBk (λ− λk )

where B is the pseudo-Hessian.

4 Repeat until converges

Parameters

Nλ = 6 six sensorsNcoef = 5 number of coefficients for each u, φL = 10 length of x intervalT = 2.3 time intervalNt = 250ρ = .01R = 1.6e − 3 variance in sensor dataσb

u = 5e − 5 used to compute Lc

σbφ = .02 used to compute Lc

L−1c =

(σb

u L 0

0 σbφL

)weight matrix P1

where L = I + γ−1(

c4

2∆x4(Lxx )2

), Lxx =

−2 1 0 · · · 0 11 −2 1 · · · 0 0...

. . .

0 11 0 · · · 1 −2

Monte Carlo Numerical Experiments

0 1 2 3 4 5 6 7 8 9 10

optimalequalrandom 1random 2

ρ/ε RMSE ua(0) RSME of ua RMSE φa(0) RSME of φa

Optimal 9.29 0.3577 0.1174 0.5186 0.1591Equal 13.05 0.4419 0.1484 0.5787 0.1974Rand 1 19.08 0.4323 0.1459 0.5734 0.1951Rand 2 14.17 0.4245 0.1431 0.5671 0.1914

Equal Imp 28.8% 19.1 % 20.9 % 10.4 % 19.4 %Rand 1 Imp 51.3 % 17.3 % 19.5 % 9.6% 18.45 %Rand 2 Imp 34.4 % 15.7 % 17.96 % 8.6 % 16.88 %

The optimal sensor locations improved the estimation accuracy in the 4D-Var dataassimilation for the observed variable φ and the unobserved variable u.

Some Remarks

• Partial observability and estimation for large scale systems havealso been applied to power systems and networked swarms.

• Unobservability index is a worst-case measure of observability.Other quantitative measure exists for various types of systems andsensor networks.

• User-knowledge and observability?

• Estimation of variables observable in a zero measure set?

Thank you!

data assimilation for high-dimensional systems...data assimilation for high-dimensional systems...

Documents