variational data assimilation and forecast error statistics
DESCRIPTION
Variational data assimilation and forecast error statistics. “All models are wrong …” (George Box) “All models are wrong and all observations are inaccurate” (a data assimilator). Ross Bannister, 11 th July 2011 University of Reading, [email protected]. - PowerPoint PPT PresentationTRANSCRIPT
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Variational data assimilationand forecast error statistics
Ross Bannister, 11th July 2011University of Reading, [email protected]
“All models are wrong …” (George Box)
“All models are wrong and all observations are inaccurate” (a data assimilator)
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Distinction between ‘errors’ and ‘error statistics’When people say ‘errors’ they sometimes (but not always) mean ‘error statistics’
Error: The difference between some estimated/measured quantity and its true value. E.g. εest = xest – xtrue or εy = y – ytrue
Errors are unknown and unknowable quantities
Error Statistics: Some useful measure of the possible values that ε could have.
E.g. a PDF
Error statistics are knowable, although often difficult to determine – even in the Gaussian case.
Here, error statistics = second moment (ie assume PDFs are Gaussian and unbiased, <ε> = 0).
ε
PDF(ε)
E.g. second moment of ε, <ε2> (called a variance), or <ε2>1/2 = σ (standard deviation). If only the variance is known, then the PDF is approximated as a Gaussian.
P(ε) ~ exp – ε2/2<ε2>
σ= √<ε2>
ε
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Plan
A. Erroneous quantities
B. Error statistics are important
C. ‘Observation’ and ‘state’ vectors
D. ‘Inner’ and ‘outer’ products
E. Error covariances
F. Bayes’ theorem
G. Flavours of variational data assimilation
H. Control variable transforms
I. The ‘BLUE’ formula
J. A single observation example
K. Forecast error covariance statistics
L. Hybrid data assimilation
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
A. Erroneous quantities in data assimilation
Data that are being fitted to
• Observations (real-world data)• Prior data for the system’s state• Prior data for any unknown parameters
Imperfections in the assimilation system
•Model error within the da system• Representivity
Data that have been fitted
• Data assimilation-fitted data (analysis, ie posteriori error statistics)
}
Information available about the system before observations are considered.
instrumentvityrepresenti
instrumentvityrepresentiresolvedtrue
exact
instrumentunresolvedtrue
resolvedtrue
exact
instrumentunresolvedtrue
resolvedtrue
exact
unresolvedtrue
resolvedtruetrue
instrumenttrueexact
modeltrueapprox
)(
)(
)(
)(
)(
)(
y
y
H
H
H
H
H
x
Hxxy
xxy
xxx
xy
xy
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
B.1 Error statistics are important1. Error statistics give a measure of confidence in data (eg here obs error stats)
No assim
Assim with large obs errors Assim with small obs errors
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
B.2 How are error statistics important?2. Error statistics of prior data imply relationships between variables
x1
x2
time
Background forecast (no assim)Analysis forecast (consistent with prior and ob errors)
x1 and x2 cannot be varied independently by the assimilation here because of the shape of the prior joint PDF.
• Known relationships between variables are often exploited to gain knowledge of the complicated nature of the prior error statistics (e.g. changes in pressure are associated with changes in wind in the mid-latitude atmosphere (geostrophic balance)).
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
C. ‘Observation’ and ‘state’ vectors
The structure of the state vector (for the example of meteorological fields u, v, θ, p, q are 3-D fields; λ, φ and ℓ are longitude, latitude and vertical level). There are n elements in total.
x =
The observation vector – comprising each observation made. There are p observations.
y =
extra parameters
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
D. ‘Inner’ and ‘outer’ products
The inner product (‘scalar’ product) between two vectors gives a scalar
The outer product between two vectors gives a matrix
When ε is a vector of errors, <εεT> is a (symmetric) error covariance matrix
1T2211
1
121
T ,,
) (
babababa nnn
n
n
bababa
b
bbaaa
nmnm
nmmm
n
nn
m bababa
bababababababbb
a
aa
T
21
22212
1211121
2
1
T , ,
) (
abbaab
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
E. Forms of (Gaussian) error covariancesThe one-variable case
2
2
2
2
2
2
2)(
exp2
1)(
2exp
2
1)(
xxxP
P
(unbiased) true
xx
xx
)()(
21exp
2
1)( 1T xxSxxS
x
nP
later used be willsymbols specific - symbol generic a is
2232
2121
S
S
(unbiased) true
xxxxε
0
1x2x
)(xP
x
)(xP σ= √<ε2>
<x>
The many variable case
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
F. Bayes’ theorem and the variational cost function
Bayes theorem links the following•PDF of the observations (given the truth)•PDF of the prior information (the background state)•PDF of the state (given the observations – this is the objective of data assimilation)
))(())((21)()(
21)(
)( minimize )|(ln minimize )|( maximize
)()(21))(())((
21ln)|(ln
)()(21))(())((
21exp
)()(21exp))(())((
21exp
)()|()|(
1Tb
1-f
Tb
b1-
fT
b1T
b1-
fT
b1T1
b1-
fT
b1T1
1
xhyRxhyxxPxxx
xyxyx
xxPxxxhyRxhyyx
xxPxxxhyRxhy
xxPxxxhyRxhy
xxyyx
J
JPP
AP
PPP
A
A
A
Lorenc A.C., Analysis methods for numerical weather prediction, Q.J.R.Meteor.Soc. 112 pp.1177-1194 (1986)
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
General form
Simplified form : assume model errors are uncorrelated in time (white noise)
G.1 Flavours of variational data assimilation
T
t
T
ttttttt
T
tttt
tttt
tttt
J
1 1'1''',
1T1
0
1T
b1-
fT
b
)])1'([)'(()()])1([)((21
)])([)(()])([)((21
))0()0(())0()0((21)(
xmxQxmx
xhyRxhy
xxPxxx
1. Weak constraint 4D-VAR
)(
)1()0(
Tx
xx
x
Q : model error covariance matrix
T
ttttt
ttttt
tttt1
11T
1
1'',
1
)])1([)(()])1([)((21
becomes last term
)(
xmxQxmx
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
G.2 Flavours of variational data assimilation2. Strong constraint 4D-VAR
Assume perfect model:
T
tttttt tt
J
00
1T0
b1-
fT
b
)])([)(()])([)((21
)()(21)(
xmhyRxmhy
xxPxxx
)0(only )0( determine toneed
)]1([)(0
1
xxx
xmxQ
tt tt
)0(xx
The ‘strong constraint’ approximation is valid if model error is negligible
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
G.3 Flavours of variational data assimilation2. Incremental 4D-VAR (quasi-linear)
ref
)()(
)(
)()(Let
)0( :remember
0
0ref
0
ref0
0
ref
xxxM
xMxm
xxm
xmxxxx
xx
t
t
t
tt
t
t
)(
model
ref
ref
model
ref
ref)()(
)()]([
)]()([
)]([)(
)()()(Let
tt
tt
t
t
tt
tt
tt
tt
ttt
xxyH
xHxh
xxh
xhyxxx
Linearisation of m Linearisation of h
Apply to strong constraint cost function
T
ttttttt
J
0
1T0
ref0
1-f
Tb
bref
b
)( ))]([)((21
)( )(21)(
Let
RxMHxmhy
Pxxx
xxx
T
tttt
tt
t
J
tt
0
1T0
1-f
Tb
ref0
)( ))((21
)( )(21)(
)]([)()(Let
RxMHy
Pxxx
xmhyy
The incremental form is exactly linear and is valid if non-linearity is ‘weak’
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
H. Control variable transforms
variable'control' theis here)]([)()( , , where
),( ))((21 )( )(
21)(
ref0
refbb
ref
0
1T0
1-Tb
xxmhyyxxxxxx
BRxMHyBxxx
tt
nnT
tttt
tt
tJ
Advantages of the new control variable:•No B -matrix to worry about•Better conditioning of the problem
In practice we design U and B follows (the implied B = U U T)The design of U is called ‘background error covariance modelling’
T
tttt
T
tttt
tJ
tJ
0
1T0
Tb
0
1T0
1-TTb
bb
)( ))((21 )()(
21)(
variable'control' theis here
)( ))((21 )( )(
21)(
,Let
RUMHy
RUMHyUBU
UxUx
IUBUU -1Tsuch that Design
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
I. The cost function and the ‘BLUE’ formula
)form' covariance' (the ))(()(
get we)()( formulaWoodbury -Morris-Sherman theUsing
)form'n informatio' (the ))(()(
0))(())((
0))()(()(
)(function cost ofmin at 0))()(()(
))()(())()((21)()(
21
))(())((21)()(
21)(
b1T
fT
fba
Tf
1TTf
1T1-f
b1T11T1-
fba
b1T
ba1T1-
f
bab1T
ba1-
f
a
bb1T
b1-
f
bb1T
bbb1-
fT
b
1Tb
1-f
Tb
xhyHHPRHPxx
HHPRRHHPHRHP
xhyRHHRHPxx
xhyRHxxHRHP
xxHxhyRHxxP
xxxxHxhyRHxxP
xxHxhyRxxHxhyxxPxx
xhyRxhyxxPxxx
x
x
JJ
J
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
J. A single observation exampleAnalysis increment of the assimilation of a direct observation of one variable.
),()(
),'()'()'(
))(()],([
0010
))((
0010
)0 0 1 0(
0010
))(()(
f2
bfba
b1
f2
f
b
1
f2
f
b1T
fT
fba
rrPry
rrPrr
ryrrP
ry
y
y
y
xxx
xP
xPP
xhyHHPRHPxx
Obs of atmospheric pressure →
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.1 Forecast error covariance statistics
• In data assimilation prior information often comes from a forecast.
•Forecast error covariance statistics (Pf) specify how the forecast might be in error
εf = xf – xtrue, Pf = <εf εfT>.
•How could Pf be estimated for use in data assimilation?
• Analysis of innovations (*).• Differences of varying length forecasts (‘NMC method’).• Monte-Carlo method (*).• Forecast time lags.
•Problems with the above methods.
•A climatological average forecast error covariance matrix is called ‘B’.
332313
322212
312111
f or
BP
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.2 Analysis of innovations We don’t know the truth, but we do have observations of the truth with known error statistics.
Definition of observation error : y = ytrue + εy = h(xtrue) + εy
Definition of forecast error : xtrue = xf – εf
Eliminate xtrue : y = h(xf – εf) + εy ≈ h(xf ) - Hεf + εy
‘Innovation’ : y - h(xf ) ≈ εy - Hεf
LHS (known), RHS(unknown)
Take pairs of in-situ obs whose errors are uncorrelated (for variable v1, posn r and v2, r+Δr) y(v1,r) - xf (v1,r) ≈ εy(v1,r) - εf(v1,r) y(v2,r +Δr) - xf (v2,r +Δr) ≈ εy(v2,r +Δr) - εf(v2,r +Δr)
Covariances<[y(v1,r) - xf (v1,r)] [y(v2,r +Δr) - xf (v2,r +Δr)]> = <[εy(v1,r) - εf(v1,r)] [εy(v2,r +Δr) - εf(v2,r +Δr)]>
= <εy(v1,r) εy(v2,r +Δr)> - <εy(v1,r) εf(v2,r +Δr)> - <εf(v1,r ) εy(v2,r +Δr)> + <εf(v1,r) εf(v2,r +Δr)> ↑ ↑ ↑ ↑ Obs error covariance Zero (obs and forecast errors Forecast error covariance between (v1, r) and (v2, r+Δr) uncorrelated) between (v1, r) and (v2, r+Δr) zero unless v1=v2 and Δr=0 (one particular matrix element of Pf or B)<> average over available observations and sample population of forecasts
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.3 Monte-Carlo method (ensembles) •N members of an ensemble of analyses.•Leads to N members of an ensemble of forecasts.•The ensemble must capture the errors contributing to forecast errors.
• Initial condition errors (forecast/observation/assimilation errors from previous data assimilation).•Model formulation errors (finite resolution, unknown
parameters, …).•Unknown forcing.
•Can be used to estimate the forecast error covariance matrix, e.g.•Pf ≈ < (x-<x>) (x-<x>) T > = 1/(N-1) ∑i=1,N (xi - <x>) (xi - <x>)T
•Problem: for some applications N << n.•n elements of the state vector (in Meteorology can be 107).•N ensemble members (typically 102).•Consequence – when Pf acts on a vector, the result is forced to
lie in the sub-space spanned by the N ensemble members.
Ensembles
t
x
N
iii
N
iii NN 1
T
1
Tf )(
11
11 xxxP
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
L. Hybrid data assimilationVariational data assimilation (with B-matrix)•B is full-rank•B is static
Ensemble data assimilation
•Sample Pf is flow dependent•Sample Pf is low rank
xxxxU
UU
RUUMHy
xxx
UxUxUxUx
N
NnNn
T
ttttt
J
1E
V
EVE
V
0
1TEEVV0
TEb
ETVb
V
EV
Eb
EEb
EEEVb
VVb
VVV
)onassimilati data al variationlincrementa standardin as(
, , ,
)( ])[)((21
)()(21 )()(
21)(
, , , Suppose
Comments and Questions
!?