data assimilation for high-dimensional systems...data assimilation for high-dimensional systems...
TRANSCRIPT
Data Assimilation for High-DimensionalSystems
− Challenges, algorithms and opportunities
Wei KangU.S. Naval Postgraduate School
2018 IMA Workshop
Collaboration
• Liang Xu (NRL)
• Sarah King (NRL)
• Kai Sun (UTK)
• Junjian Qi (UCF)
• Isaac Kaminer (NPS)
• Qi Gong (UCSC)
• Lucas Wilcox (NPS)
• Arthur J. Krener (NPS)
• Randy Boucher (Army)
• Data assimilation
• Numerical weatherprediction
• Partial observability
• Power systems
• Swarms of autonomousvehicles
• Numerical algorithms
High-Dimensional Systems
Numerical Weather Prediction- ECMWF Global Model
ECMWF Global Model: 8× 107
model variables are updated ev-ery 12 hours using 1.3×107 ob-servations
Power System- Eastern Interconnection
EI: A largest electrical gridin North America. A simpli-fied model has more than 25Kbuses, 28K lines, 8K transform-ers, 1, 000 generators.
High-Dimensional Systems
ChallengesThe scalability of algorithms is limited by several factors: computationalload, I/O overhead and required memory size, degree of parallelism, andpower consumption.
Covariance Matrixdimension vs RAM size
102 103 104 105 106 107 108
Dimension
10-5
100
103
106
108
Me
mo
ry (
GB
)
GB
TB
PB
Quantitative ChangeBecomes Qualitative
Data Assimilation
Liang Xu - NRL
Numerical Weather Prediction
Observation(y)
Background(xb)
DATA ASSIMILATION
Analysis(xa)
FORECASTMODEL Forecast
Data Assimilation
System Model
xk+1 = M(xk ) + ηk , xk ∈ Rn, ηk ∼ model uncertainty,Qyk = H(xk ) + δk , yk ∈ Rm, δk ∼ sensor noise,R
Linearizationxk+1 = Mkxk + · · ·yk = Hkxk + · · ·
Both n and m are large. In daily operations, only a small part of sensordata is used.
4D-Var
Cost function
J (x0) =1
2(x0−xb)T (Pb
0 )−1(x0−xb)+1
2
N∑i=0
(yi−H(xi ))TR−1(yi−H(xi ))
4D-Var estimation{min
xa0
J (xa0 )
xak =M(xa
k−1), k = 1, 2, · · · ,N
The trajecotry, xa0 , x
a1 , · · · , xa
N , is called an analysis; xb is the initialestimate, or background; Pb
0 is a positive definite matrix (fixed).
4D-Var
A linear solution
xak = xb
k + gk , 0 ≤ k ≤ N
g = PHT(HPHT + R
)−1(y − Hxb)
g =
g0...gN
,P = (Pij )Ni ,j=0,H = diag(H0, · · · ,HN),R, y , xb, · · ·
Or define
z =(HPHT + R
)−1(y − Hxb)
Thenxa
k = xbk + gk , 0 ≤ k ≤ N
g = PHT zHPHT z + Rz = (y − Hxb)
4D-Var
A 4D-Var algorithm
fN+1 = 0for k = N : 0
fk = MTk fk+1 + HT zk ,
g0 = Pb0 f0
for k = 0 : Ngk+1 = Mkgk ,
• Mkgk and MTk fk are computed using a tangent linear model and
an adjoint model.
• The equation HPHT + Rzk = (y − Hxb) is solved using conjugategradient algorithm.
4D-Var
• 4D-Var has been widely used in today’s numerical weatherprediction (NWP). It was adopted shortly after the introduction of3D-Var.
• Historically 3D-Var and 4D-Var are inspired by optimal control inwhich a cost function is minimized (Sasaki (1958), Lorenc (1986)).
• It is an effective method to provide estimation results with anaffordable computational load.
• The method does not provide information about error covariance.
• It requires the development and maintenance of tangent linearmodels and adjoint models.
EnKF
Ensemble KF
Ensemble x ik|k , 1 ≤ i ≤ Nens Nens << n
Forecast x ik+1|k =M(x i
k|k ) + ηk
y ik+1|k = H(x i
k|k )
xk+1 = 1Nens
Nens∑i=1
x ik+1|k
yk+1 = 1Nens
Nens∑i=1
y ik+1|k
EnKF
Ensemble KF (Cont.)
Analysis ∆X = 1√Nens−1
([x1k+1|k · · · xNens
k+1|k
]− xk+11
)n × Nens
∆Y = 1√Nens−1
([y1
k+1|k · · · yNens
k+1|k
]− yk+11
)m × Nens
K = ∆X∆Y T (∆Y∆Y T + R)−1 n ×mxk+1|k+1 = xk+1 + K (yk+1 − yk+1)
Update D = I −∆Y T (∆Y∆Y T + R)−1∆Y Nens × Nens
Pk+1|k+1 = ∆XD∆XT[x1k+1|k+1 x2k+1|k+1 · · · xNens
k+1|k+1
]= xk+1|k+1 + ∆X (
√(Nens − 1)D)T
EnKF
• EnKF does not require tangent linear model and adjoint model
• It contains partial information about error statistics
• Undersampling and rank deficiency
• Filter divergence
• Inbreeding - systematically underestimate the analysis errorcovariance
• Spurious correlations - covariance between state components thatare not physically related
EnKF
Localization - components physically far a way are uncorrelated.
Set Pij = 0 for i and j far awaySuppose y is a part of state variables at i1, · · · , im. Let ρ be a sparsematrix that defines the sparsity of P. Define ρxy ∈ Rn×m andρyy ∈ Rm×m
Pxyst = ρis it ,P
yyst = ρis it
Then
Kloc = ρxy . ∗ (∆X∆Y T )(ρyy . ∗ (∆Y∆Y T ) + R
)−1Inflation - rescale ∆X and ∆Y by a small factor.
Sparsity-based Filters
Sparsity-based filters: The goal is to avoid rank deficiency, providemore error covariance information, and achieve granularity control foroptimal parallelism.
A variety of parallel computing architectures are available; and newtechnologies are being developed rapidly.
• Multi-core CPU
• General-purpose GPU
• Clusters or massively parallel computing
• Grid computing
• Application-specific integrated curcuits
• ......
Sparsity-based Filters
Sparsity based methods
• Approximately sparse error covariance
Nsp = maximum number of nonzero entries in columns
Ii (P) = indices of nonzero entries in the ith-column
• Component-based numerical model
M(x spk ; I) or Mcomp
I = indices of entries to be evaluated
Sparsity-based Filters
A progressive approach
AssumeMkPkM
Tk = Pk + ∆Pk+1
To estimate ∆Pk+1, assume
Mk+1 = I + ∆Mk
xk+1 =M(xk ) = xk + ∆(xk )
ThenMkPkM
Tk = (I + ∆Mk )Pk (I + ∆MT
k )
= Pk + ∆MkPk + (∆MkPk )T + · · ·≈ (M(xk + δPk )−M(xk )) /δ
+ (M(xk + δPk )−M(xk ))T /δ − Pk
Sparsity-based Filters
Prograssive KF
Background xk|k and Pspk|k (sparse covariance approximation)
Forecast xk+1|k =M(xk|k )yk+1|k = H(xk|k )
Pspk+1|k =
(Mcomp(x sp
k|k + δPspk|k )−Mcomp(x sp
k|k ))/δ
+(Mcomp(x sp
k|k + δPspk|k )−Mcomp(x sp
k|k ))T
/δ
−Pspk|k + Q
Analysis K = Pspk+1|kH
Tk+1(Hk+1P
spk+1|kH
Tk+1 + R)−1
Pspk+1|k+1 = (I − KHk+1)Psp
k+1|kxk+1|k+1 = xk+1|k + K (yk+1 − yk+1|k )
Sparsity-based Filters
Computational loadProgressive KF number of modelFull model components evaluation
M(xk )M (xk + δPk (:, i)) (n + 1)nNp
i = 1, 2, · · · , n Np- progressive steps
Progressive KFComponent-based model
M(xk )M (xk + δPk (:, i), Ii (P)) (Nsp + 1)nNp
i = 1, 2, · · · , nEnsemble KF
M(x ik )
i = 1, 2, · · · ,Nens Nensn
Sparsity-based Filters
Avoid rank deficiency and achieve granularity control
Ensemble
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX
Sparse Covariance
X X XX X X XX X X X X0 0 0 0 X X0 0 0 0 0 X X0 0 0 0 0 0 X X0 0 0 0 0 0 0 X X0 0 0 0 0 0 0 0 X X0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 0 X X X X X0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X
ØP has full rankØNsp is a variableØTasks can be grouped in different size:
Coarse-grained, medium-grained, fine-grainedØP localization is straightforward
Rank:Nens
Size: n x Nens
Sparsity-based Filters
Lorenz-96 model
dxi
dt= (xi+1 − xi−2)xi−1 − xi + F , i = 1, 2, · · · ,m
xm+1 = x1
Discretization - 4th-order RK
xk =M(xk−1)
∆t = 0.025
F = 8
m = 400 5 10 15 20 25
t
-6
-4
-2
0
2
4
6
8
10
12
x
Lorenz-96 Chaotic Trajectory
Sparsity-based Filters
A comparisonN = 1000 initial states in [−1 1] - uniform distribution.Nfilter = 4000 filter stepsm = 20 measurement locationsR = I
Filter Size CMPT
EVAL
EnKF Nens = 10 400
P-KF Nsp = 7
Np = 1 320
P-KF Nsp = 11
Np = 2 480x2
P-KF Nsp = 11
Np = 3 480x3 EnKF PKF PKF PKF UKF
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
RM
SE
EnKF vs Progressive-KF
Sparsity-based Filters
Unscented KF (UKF)
σ-points x ik|k , 0 ≤ i ≤ 2n
x0k|k = xk|k
Forecast x ik+1|k =M(x i
k|k ), y ik+1|k = H(x i
k|k ), 0 ≤ i ≤ 2n
xk+1 =2n∑
i=0
wixik+1|k , yk+1 =
2n∑i=0
wiyik+1|k
Pk+1|k =2n∑
i=0
wi ∆x ik+1(∆x i
k+1)T + Q
∆x ik+1 = x i
k+1|k − xk+1
w0 = κn+κ , wi = κ
2(n+κ)
Sparsity-based Filters
UKF (Cont.)
Analysis Pyy =2n∑
i=0
wi ∆y ik+1(∆y i
k+1)T + R, ∆yk+1 = y ik+1|k − yk+1
Pxy =2n∑
i=0
wi ∆x ik+1(∆y i
k+1)T
KPyy = Pxy
xk+1|k+1 = xk+1 + K (yk+1 − yk+1)
Update Pk+1|k+1 = Pk+1|k − K (Pxy )T
x ik+1|k+1 = xk+1|k+1 +
√(n + κ)Pk+1|k+1, i = 1, 2, · · · , n
x ik+1|k+1 = xk+1|k+1 −
√(n + κ)Pk+1|k+1, i = n + 1, · · · , 2n
Sparsity-based Filters
Sparsity of square root matrix
Theorem (S. Toledo). If P is a symmetric positive definite matrix. Theamount of storage for a Cholesky decomposition of P is O(n + 2η(P)),where η(P) is the number of nonzero entries in P.
Assumption: The sparsity patterns of P and√P are known - I (P),
I (√P).
Sparsity-based Filters
Sparse UKF
Sparse x0k|k = xk|kσ-points σi , Ii (sparsity index) 1 ≤ i ≤ n
Forecast x0k+1|k =M(x0k|k ),
x ik+1|k =Mcomp(x0k|k + σi ), x i+n
k+1|k =Mcomp(x0k|k − σi )
y ik+1|k = H(x i
k+1|k.Iix0k+1|k ), 1 ≤ i ≤ 2n
xk+1 =2n∑
i=0
wi (xik+1|k.Ii
x0k+1|k ), yk+1 =2n∑
i=0
wiyik+1|k
Pspk+1|k =
2n∑i=0
wi
(∆x i
k+1(∆x ik+1)T
)sp+ Q
w0 = κn+κ , wi = κ
2(n+κ) , ∆x ik+1 = x i
k+1|k.Iix0k+1|k − xk+1
x sp1 .I x2 - merging operation.
Sparsity-based Filters
Sparse UKF (Cont.)
Analysis Pyy =2n∑
i=0
wi ∆y ik+1(∆y i
k+1)T , ∆y ik+1 = y i
k+1|k − yk+1
Pxy =2n∑
i=0
wi ∆x ik+1(∆y i
k+1)T
KPyy = Pxy
xk+1|k+1 = xk+1 + K (yk+1 − yk+1)
Update Pspk+1|k+1 = Psp
k+1|k −(K (Pxy )T
)sp
σi , Ii ∼√
(n + κ)Pspk+1|k+1, i = 1, 2, · · · , n
Sparsity-based Filters
A comparisonN = 1000 initial states in [−1 1] - uniform distribution.Nfilter = 4000 filter stepsm = 20 measurement locationsR = I
Filter Size CMPT
EVAL
EnKF Nens = 10 400
S-UKF Nsp = 7 640
S-UKF Nsp = 11 960
EnKF S-UKF S-UKF UKF
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
RM
SE
EnKF vs Sparse-UKF
Summary
Progressive KF
• Full rank covariance
• Sparse covariance matrix reduces I/O load and memory size.
• Component-based model reduces computational load.
• Requirement: Pk+1 ≈ Pk + MPk + PkMT
Sparse UKF
• Full rank covariance
• Sparse covariance matrix reduces I/O load and memory size.
• Component-based model reduces computational load.
• Requirement: Cholesky decomposition (additional computationand memory).
Partial Observability
Partial Observability - A quantitative definition
– x - space variable, λ - sensor location, u(x , t) - state trajectory
– y(t) = h(u(x , t), λ) - system output (sensor data)
– Background variation (data assimilation cost function)
J(u, δu, λ) = δuTP1δu + ||y(·, λ; u + δu)− y(·, λ; u)||P2
Definition. Let ε > 0 be a positive number. The observabilityambiguity, ρ, is defined as
ρ2 = maxδu||P(δu)||W
subject to: ||J(uB , δu, λ)|| ≤ εsystem model of u(x , t)
P(δu) ∈W (subspace for estimation)
The ratio, ρ/ε, is called the unobservability index.
Partial Observability
Empirical Gramian
• Let {wi} be an orthonormal basis in W . Let ∆y(t) be thevariation of the output. The Gramian is defined as follows
G = 〈wi ,wj〉P1 + 〈∆yi (t),∆yj (t)〉P2 .
• Let σmin be the smallest eigenvalue of G then 1/√σmin is a first
order approximation of the unobservability index ρ/ε.
• The large scale and high dimensions in NWP make it less desirableor even impossible to make the entire state space observable. Afinite number of modes is enough to provide an accurateapproximation.
Shallow Water Equations (SWEs)
Simplest atmospheric flow model
∂u
∂t+ u
∂u
∂x+∂φ
∂x+ g
∂P
∂x= 0
∂φ
∂t+ u
∂φ
∂x+ φ
∂u
∂x= 0
– x- horizontal distance, u-horizontal velocity, p-fluid depth
– g -gravitational constant, P(x)-height of the obstacle
– φ = gp-geopotential.
0 1 2 3 4 5 6 7 8 9 10
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 100
1
2
3
0.198
0.1985
0.199
0.1995
0.2
0.2005
0.201
0.2015
0.202
t
x
Initial conditions Solution
Algorithm for Optimal Sensor Placement
Optimal sensor location - maximizing the unobservability index
maxσmin(λ)
subject to: λlower < λ < λupper
where σmin is the smallest eigenvalue of G .
1 Initialize λ0, P1, P2
2 Approximate the current solution to the maximization problem bycomputing the smallest eigenvalue σ of the gramian matrix usingthe TLM.
3 Update λ using BFGS step
mk (λ) = σ(λk ) +
(dσ
dλ k
)T
(λ− λk ) +1
2(λ− λk )TBk (λ− λk )
where B is the pseudo-Hessian.
4 Repeat until converges
Parameters
Nλ = 6 six sensorsNcoef = 5 number of coefficients for each u, φL = 10 length of x intervalT = 2.3 time intervalNt = 250ρ = .01R = 1.6e − 3 variance in sensor dataσb
u = 5e − 5 used to compute Lc
σbφ = .02 used to compute Lc
L−1c =
(σb
u L 0
0 σbφL
)weight matrix P1
where L = I + γ−1(
c4
2∆x4(Lxx )2
), Lxx =
−2 1 0 · · · 0 11 −2 1 · · · 0 0...
. . .
0 11 0 · · · 1 −2
Monte Carlo Numerical Experiments
0 1 2 3 4 5 6 7 8 9 10
optimalequalrandom 1random 2
ρ/ε RMSE ua(0) RSME of ua RMSE φa(0) RSME of φa
Optimal 9.29 0.3577 0.1174 0.5186 0.1591Equal 13.05 0.4419 0.1484 0.5787 0.1974Rand 1 19.08 0.4323 0.1459 0.5734 0.1951Rand 2 14.17 0.4245 0.1431 0.5671 0.1914
Equal Imp 28.8% 19.1 % 20.9 % 10.4 % 19.4 %Rand 1 Imp 51.3 % 17.3 % 19.5 % 9.6% 18.45 %Rand 2 Imp 34.4 % 15.7 % 17.96 % 8.6 % 16.88 %
The optimal sensor locations improved the estimation accuracy in the 4D-Var dataassimilation for the observed variable φ and the unobserved variable u.
Some Remarks
• Partial observability and estimation for large scale systems havealso been applied to power systems and networked swarms.
• Unobservability index is a worst-case measure of observability.Other quantitative measure exists for various types of systems andsensor networks.
• User-knowledge and observability?
• Estimation of variables observable in a zero measure set?
Some Remarks
• Partial observability and estimation for large scale systems havealso been applied to power systems and networked swarms.
• Unobservability index is a worst-case measure of observability.Other quantitative measure exists for various types of systems andsensor networks.
• User-knowledge and observability?
• Estimation of variables observable in a zero measure set?
Thank you!