the role of additive and multiplicative noise in ﬁltering ... · the role of additive and...

The role of additive and multiplicative noise in

filtering complex dynamical systems

By Georg A. Gottwald1and John Harlim

2

1 School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia,[email protected]

2 Department of Mathematics, North Carolina State University, BOX 8205, Raleigh, NC 27695, U.S.A.,[email protected]

Covariance inflation is an ad-hoc treatment that is widely used in practical real-time data assimilationalgorithms to mitigate covariance underestimation due to model errors, nonlinearity, or/and, in the contextof ensemble filters, insufficient ensemble size. In this paper, we systematically derive an effective “statistical”inflation for filtering multi-scale dynamical systems with moderate scale gap, ǫ = O(10−1), to the case ofno scale gap with ǫ = O(1), in the presence of model errors through reduced dynamics from rigorousstochastic subgrid-scale parametrizations.

We will demonstrate that for linear problems, an effective covariance inflation is achieved by a systemat-ically derived additive noise in the forecast model, producing superior filtering skill. For nonlinear problems,we will study an analytically solvable stochastic test model, mimicking turbulent signals in regimes rangingfrom a turbulent energy transfer range to a dissipative range to a laminar regime. In this context, we willshow that multiplicative noise naturally arises in addition to additive noise in a reduced stochastic forecastmodel. Subsequently, we will show that a “statistical” inflation factor that involves mean correction inaddition to covariance inflation is necessary to achieve accurate filtering in the presence of intermittentinstability in both the turbulent energy transfer range and the dissipative range.

Keywords: data assimilation; filtering; multi-scale systems; covariance

inflation; stochastic parametrisation; additive noise; multiplicative noise;

model error; averaging

1. Introduction

An integral part of each numerical weather forecasting scheme is the estimation of the most accuratepossible initial conditions given a forecast model with possible model error and noisy observations atdiscrete observation intervals; this process is called data assimilation (see e.g. Kalnay 2002, Majda &Harlim 2012). The presence of the often chaotic multi-scale nature of the underlying nonlinear dynamicssignificantly complicates this process. Each forecast estimate, or background state, is an approximationof the current atmospheric state, with a whole range of sources for error and uncertainty such as modelerror, errors due to the nonlinear chaotic nature of the model, the unavoidable presence of unresolvedsubgrid-scales, as well as errors caused by the numerical discretisation (see e.g. Palmer 2001).

An attractive method for data assimilation is the ensemble Kalman filter introduced by Evensen (1994,2006) which computes a Monte-Carlo approximation of the forecast error covariance on-the-fly as an esti-mate for the degree of uncertainty of the forecast caused by the model dynamics. The idea behind ensemblebased methods is that the nonlinear chaotic dynamics of the underlying forecast model and the associatedsensitivity to initial conditions cause an ensemble of trajectories to explore sufficiently large parts of thephase space in order to deduce meaningful statistical properties of the dynamics. A requirement for a reli-able estimate is an adequate size of the ensemble (Houtekamer & Mitchell 1998, Ehrendorfer 2007, Petrie& Dance 2010). However, all currently operational ensemble systems suffer from insufficient ensemble sizescausing sampling errors. The associated underdispersiveness implies that the true atmospheric state is onaverage outside the statistically expected range of the forecast or analysis (e.g. Buizza et al. (2005), Hamill& Whitaker (2011)). An underdispersive ensemble usually underestimates the forecast error covariance

Article submitted to Royal Society TEX Paper

2 Gottwald and Harlim

which potentially leads to filter divergence whereby the filter trusts its own forecast and ignores the in-formation provided by the observations. This filter divergence is caused by ensemble members aligningwith the most unstable direction of the forecast dynamics as shown by Ng et al. (2011). To mitigate thisunderdispersiveness of the forecast ensemble and avoid filter divergence the method of covariance inflationwas introduced (Anderson & Anderson 1999) whereby the prior forecast error covariance is artificiallyincreased in each assimilation cycle. Covariance inflation can either be done in a multiplicative (Anderson2001) or in an additive fashion (Sandu et al. 2007, Houtekamer et al. 2009). This is usually done globallyand involves careful and computationally expensive tuning of the inflation factor (for recent methods onadaptive estimation of the inflation factor from the innovation statistics, see Anderson 2007, 2009, Li et al.2009, Miyoshi 2011). Although covariance inflation is widely used in the context of ensemble based filters,it can also be used to mitigate covariance underestimation in the presence of model errors in non-ensemblebased setting. For example, Majda & Harlim (2012) discuss an effective covariance inflation using an in-formation theory criterion to compensate model errors due to numerical discretization in linear filteringproblems. From the perspective of linear Kalman filter theory, the covariance inflation improves the linearcontrollability condition for accurate filtered solutions (Castronovo et al. 2008). Law & Stuart (2012) inflatethe static background covariance in a 3DVAR setting to stabilise the filter.

The goal of this paper is to compute an effective “statistical inflation” to achieve accurate filteringin the presence of model errors. In particular, we study filtering multi-scale dynamics with model errorsarising from a misrepresentation of unresolved sub-grid scale processes in the regime of moderate time-scaleseparation. The filtering problem, where the fast degrees of freedom are not observable, has been consideredby Pavliotis & Stuart (2007), Zhang (2011), Mitchell & Gottwald (2012), Imkeller et al. (2012), with a largescale gap assumption, ǫ ≪ 1. Several numerical methods to address the same problem for moderate scalegap were developed by Harlim (2011), Kang & Harlim (2012), Harlim &Majda (2013). While these methodsproduce encouraging results on nontrivial applications, unfortunately, they are difficult to be justified inrigorous fashion. Averaging and homogenisation techniques developed by Khasminskii (1968), Kurtz (1973),Papanicolaou (1976) provide a rigorous backbone for developing reduced slow dynamics for certain classesof multi-scale systems. Following the idea of Mitchell & Gottwald (2012) we will study how these stochasticmodel reductions can be used in data assimilation and how they act as a dynamically consistent way ofstatistical inflation. Our particular emphasis here is to understand the role of additive and multiplicativenoise in stochastic model reduction in improving filtering skill. We employ theoretically well-understoodstochastic model reductions such as averaging, formulated in the framework of singular perturbation theory,and stochastic invariant manifold theory. We consider multi-scale systems whose averaging limit does notgenerate additional stochastic diffusion terms; as an innovative feature, we reintroduce stochastic effectsinduced by the fast dynamics by going one order higher in the time scale separation parameter ǫ in theclassical averaging theory. The question we address here is, in what way do these stochastic effects improvethe process of data assimilation for multi-scale systems.

The outline of the paper is the following. In Section 2 we consider a linear model and show howadditive inflation naturally arises as judicious model error for the resolved slow scales. We will showthat while singular perturbation expansion allows us to improve the estimates of the statistical momentswhen compared to the averaged system, the reduced Fokker-Planck equation for the slow variable doesnot support a non-negative probability density function. Consequently, this method does not provide anymeans to estimate the temporal evolution of the mean and the variance needed for data assimilation. As aremedy, we will derive an approximate reduced stochastic model via invariant manifold theory (Berglund& Gentz 2005, Fenichel 1979, Boxler 1989, Roberts 2008). This yields a Langevin equation for the slowvariable the variance of which is one order less accurate than that obtained with the singular perturbationtheory, but provides an explicit expression for the temporal evolution which can be used to propagatemean and variance in the data assimilation step. The reduced stochastic differential equation obtainedthrough stochastic invariant manifold theory produces significantly better filtering skills when compared tothe classical averaged equation. The additional additive noise correction of the reduced stochastic systemprovides an effective covariance inflation factor. Nonlinear problems will be discussed in Section 3. There, wewill consider an analytically solvable nonlinear stochastic test model, mimicking turbulent signals in regimesranging from the turbulent energy transfer range to the dissipative range to laminar regime (Majda &

Article submitted to Royal Society

Additive and multiplicative noise in filtering 3

Harlim 2012, Branicki et al. 2012). We will show that the approximate reduced model involves both, additiveand multiplicative noise, and improves the accuracy in the actual error convergence rate of solutions whencompared to the classic averaged equations. For data assimilation applications, we will derive an analyticallysolvable non-Gaussian filter based on the approximate reduced model with additive and multiplicativenoise corrections which effectively inflates the first and second order statistics. We will demonstrate thatthis non-Gaussian filter produces accurate filtered solutions comparable to those of the true filter in thepresence of intermittent instability in both the turbulent energy transfer range and the dissipative range. Weconclude with a short summary and discussion in Section 4. We accompany this article with an electronicsupplementary material that discusses the detailed calculations.

2. Linear model

We first consider the linear multi-scale model

dx = (a11x+ a12y) dt+ σxdWx, (2.1)

dy =1

ǫ(a21x+ a22y) dt+

σy√ǫdWy, (2.2)

for a slow variable x ∈ R and fast variable y ∈ R. Here, Wx,Wy are independent Wiener processes andthe parameter ǫ characterizes the time scale gap. We assume throughout that σx, σy 6= 0 and that theeigenvalues of the matrix

A =

(

a11 a121

ǫa21

1

ǫa22

)

are strictly negative, to assure the existence of a unique invariant joint density. Furthermore we requirea = a11 − a12a

−122 a21 < 0 to assure that the leading order slow dynamics supports an invariant density

(cf.(2.3)).

(a) Singular perturbation theory: averaging and beyond

The multi-scale linear system (2.1)-(2.2) is amenable to averaging. A recent exposition of the theoryof averaging and its applications is provided for example in Givon et al. (2004) and in Pavliotis & Stuart(2008). The goal of this section is to apply singular perturbation theory to obtain one order higher thanthe usual averaging limit. We will derive an O(ǫ) approximation for the joint probability density functionρ(x, y, t) of the full multi-scale system (2.1)-(2.2), when marginalised over the fast variable y. The resultinghigher-order approximation of the probability density will turn out to be not strictly non-negative whichimplies that it is not a proper probability density function, and therefore cannot be associated with astochastic differential equation for the slow variable x. However, it will enable us to find corrections to themean and variance and all higher order moments.

The standard effective averaged slow dynamics,

dX = aXdt+ σxdWx, (2.3)

is obtained by averaging the slow equation (2.1) over the conditional invariant measure ρ∞(y;x) inducedby the fast dynamics where the slow variable x is assumed fixed in the limit of infinite time scale separationǫ → 0. Formally, the effective averaged slow dynamics can be derived by finding a solution of the Fokker-Planck equation associated with (2.1)-(2.2) with the following multiscale expansion,

ρ(x, y, t) = ρ0 + ǫρ1 + ǫ2ρ2 + · · · . (2.4)

and the reduced system in (2.3) is the Langevin equation corresponding to marginal distribution,∫

ρ0(x, y, t) dy(see Appendix A in the electronic supplementary material). Here, we are looking for a higher order correc-tion through the contribution from ρ1(x, y, t). In order for ρ0 + ǫρ1 to be a density we require

∫

dx dy ρ1 = 0. (2.5)



10−2

10−1

10−4

10−3

10−2

10−1

100

ǫEV

Q0

Q0+ε Q

1

Q0+ε(Q

1+Q

2)

Figure 1. Absolute error of the variance of the slow variable x when calculated from the full multi-scale system(2.1)-(2.2) and from the singular perturbation theory result (2.6) evaluated at t = 1.3. The variances were evaluatedat time t = 2.8. Parameters were with a11 = 1, a12 = −1, a21 = −1, a22 = −1 and σx = σy =

√2. Linear regression

of the upper two data sets reveals a slope of 0.95 and 0.90, respectively, the slope of the lower line is estimated as1.80.

Again, see Appendix A (and eqn (A14)) for the detailed discussion and an explicit solution for ρ1 thatsatisfies (2.5).

We find that for each value of ǫ the expression ρ = ρ0 + ǫρ1 is not a strictly positive function of x andy. This implies that, contrary to ρ0, the next order probability density function ρ = ρ0 + ǫρ1 does notsatisfy a Fokker-Planck equation which describes the time evolution of a probability density function. Assuch it is not possible to find a reduced Langevin equation corresponding to ρ = ρ0 + ǫρ1. This indicatesthat in general memory effects are too significant to allow for a decorrelation at that order. In the nextsubsection we will explore a different approximative approach which allows to derive approximate slowLangevin equations. This as we will see will come at the cost of less accurate estimate of the variance ofthe slow variable.

Despite the lack of a probabilistic interpretation of the Fokker-Planck equation associated with ρ0+ǫρ1,we can use the higher-order approximation of the probability density function to calculate O(ǫ)-correctionsto the variance of the slow variable. We find that Eρ1

(x) = 0, which implies that ρ1 does not contribute tothe mean solution, and

E(x2) = Eρ0(x2) + ǫEρ1

(x2) +O(ǫ2)

= −σ2x

2a(1 − e2at)− ǫ

a212σ2y

2aa222(1− e2at) + ǫ

a12a21σ2x

2aa222(1 − e4at) +O(ǫ2)

= Q0 + ǫ (Q1 +Q2) +O(ǫ2). (2.6)

Note that for sufficiently small values of ǫ the approximation to the variance is always positive. In Figure 1we show how the approximation to the variance (2.6) converges with ǫ to the variance of the slow variablex of the full multi-scale system (2.1)-(2.2). We plot the absolute error of the variances EV = |E(x2) −(

Eρ0(x2) + ǫEρ1

(x2))

| using only the averaged system (i.e. Q0), with partial correction (i.e., Q0 + ǫQ1),and with full higher order correction (i.e. Q0 + ǫ(Q1 +Q2)). As expected from the singular perturbationtheory we observe linear scaling behaviour for the averaged system without ǫ(Q1 + Q2) correction andquadratic scaling behaviour with the higher-order correction in (2.6). We should also point out that thecovariances Q1+Q2 are not Gaussian statistics (one can verify that the kurtosis is not equal to three timesof the square of the variance for a generic choice of parameters) and therefore it is mathematically unclearhow to use this statistical correction in the data assimilation setting since we have no information aboutthe temporal evolution of these non-Gaussian statistics without a Fokker-Planck equation which supportsa probabilistic solution.



(b) Stochastic invariant manifold theory: An approximate reduced slow diffusive model

In this Section we will derive an approximate reduced stochastic differential equation for the generallinear system (2.1)-(2.2). The resulting equation will produce inferior second order statistics compared tothe singular perturbation theory described in the previous subsection (cf. (2.6)), but will be advantageousin the context of data assimilation as we will demonstrate in next section.

We employ here invariant manifold theory (see for example Berglund & Gentz 2005, Fenichel 1979,Boxler 1989, Roberts 2008). We perform a formal perturbation theory directly with equations (2.1)-(2.2).Ignoring terms of O(ǫ) in (2.2) we obtain

y = h(x, t) = −a21a22

x−√ǫσx

a22Wy +O(ǫ).

The slow stochastic invariant manifold h(x, t) is hyperbolic, satisfying the requirements outlined in Fenichel(1979). Substituting into (2.1) and ignoring the O(ǫ) contribution, we arrive at a reduced equation for theslow variable

dX = aX dt+ σxdWx −√ǫσy

a12a22

dWy, (2.7)

which contains an additional additive noise term when compared to the averaged reduced equation (2.3).Note that whereas in the framework of the Fokker-Planck equation employed in singular perturbationtheory the diffusion term and the drift term of the fast equation are of the same order in ǫ, invariantmanifold theory works directly with the stochastic differential equation and hence allocates different ordersof ǫ to the drift and diffusion terms.

We now show that solutions of (2.7) converge to solutions xǫ(t) of the full system (2.1)-(2.2) in apathwise sense. There exists a vast literature on stochastic invariant manifold theory (see for example themonograph by Berglund & Gentz (2005)). There are, however, not many convergence results which establishthe scaling with the time scale separation ǫ. In Appendix B in the electronic supplementary material, weshow that the error e(t) = xǫ(t)− X(t) is bounded for finite time T by

E

(

sup0≤t≤T

|e(t)|2)

≤ cǫ2. (2.8)

This states that the rate of convergence of solutions of the stochastically reduced system (2.7) is one orderbetter than for the averaged system (2.3). Figure 2 illustrates the convergence of solutions of the averagedmodel (2.3) and the reduced model (2.7) obtained from stochastic invariant manifold theory confirmingthe estimate in (2.8). Figure 1, however, shows that this superior scaling of the solutions does not implybetter scaling in the convergence of the variances. Recall that the variance of the reduced system (2.7)E(X2) = Q0 + ǫQ1 constitutes a truncation of the O(ǫ)-correction (see (2.6)), but note the improvementin the actual error when compared to the averaged system with E(X2) = Q0 (again, see Figure 2).

(c) Data assimilation for the linear model

In this section, we compare numerical results of filtering the linear system in (2.1)-(2.2), assimilatingnoisy observations,

zm = x(tm) + εom = Gu(tm) + εom, εom ∼ N (0, ro), (2.9)

of the slow variable x at discrete time step tm with constant observation time interval ∆t = tm+1 − tm.For convenience, we define u = (x, y)T and the observation operator G = [1, 0]. The observations in (2.9)are corrupted by unbiased Gaussian noise with variance ro. In our numerical simulation below, we definesignal-to-noise ratio, SNR = V ar(x)/ro, such that SNR−1 indicates the relative observation error variancecompared to the temporal (climatological) variance of x.

Suppose that u−m and R−

m are the prior mean and error covariance estimates at time tm. In this Gaussianand linear setting, the optimal posterior mean, u+

m, and error covariance, R+m, estimates are obtained by

the standard Kalman filter formula (Anderson & Moore 1979):

u+m = u−

m +Km(zm −Gu−m),



10−2 10−110−3

10−2

10−1

100

101

Esu

p

ε

X

X

Figure 2. Convergence of solutions xǫ of the full model (2.1)-(2.2) to those of the reduced models, (2.3) and (2.7), ona time interval 0 ≤ t ≤ T = 250. Parameters are a11 = a21 = a22 = −1, a12 = 1 and σx = σy =

√2. The supremum

errors, Esup ≡ E(sup0≤t≤T |e(t)|2), where e(t) = xǫ(t)−X(t) (circles) and e(t) = xǫ(t)− X(t) (squares), are plotted

as functions of ǫ. Trajectories were averaged over 100 realisations of the Wiener processes. Linear regression of thetwo data sets reveals a slope of 0.9 and 1.8 for the error of X and X , respectively.

R+m = (I −KmG)R−

m, (2.10)

Km = R−mGT (GR−

mGT + ro)−1.

To complete one cycle of filtering (or data assimilation), the next prior statistical estimates are obtainedby propagating the posterior according to

u−m+1 = Fu+

m, (2.11)

R−m+1 = FR+

mFT +Q, (2.12)

where for the true filter, F and Q are 2× 2-matrices associated with the dynamical propagation operator,corresponding to the full linear multi-scale system (2.1)-(2.2), and the model error covariance, respectively.We will compare the true filter with two reduced filtering strategies: (i) Reduced Stochastic Filter (RSF)which implements a one-dimensional Kalman filter on Gaussian prior statistics produced by the standardaveraging model (2.3); (ii) Reduced Stochastic Filter with Additive correction (RSFA) which implements aone-dimensional Kalman filter on Gaussian prior statistics produced by model (2.7) obtained from stochas-tic invariant manifold theory. For both reduced filters (RSF and RSFA), we implement the same formulae(2.10) and (2.11), replacing u with x, G = [1, 0] with G = 1, and using F = exp(a∆t). The differencebetween RSF and RSFA is on the covariance update where Q = Q0 for RSF and Q = Q0 + ǫQ1 for RSFA.Therefore, RSFA is nothing but applying RSF with an effective additive covariance inflation factor, ǫQ1.

To measure the filter accuracy, we compute a temporally average root-mean-square (RMS) error betweenthe posterior mean estimate, x+

m, and the underlying true signal, x(tm), for numerical simulations up to timeT = 10, 000. In Figure 3, we show the average RMS errors as functions of ǫ for ∆t = 1 and SNR−1 = 0.5(left panel); as functions of SNR−1 for ǫ = 1 and ∆t = 1 (middle panel); and as functions of ∆t for ǫ = 1and SNR−1 = 0.5 (right panel). Notice that the average RMS error for RSFA is almost identical to that ofthe true filter. More importantly, the additive inflation in RSFA improves the filtered accuracy comparedto RSF.

3. Nonlinear model

In this section, we consider solutions of the nonlinear SPEKF model as true signals. This nonlinear testmodel was introduced in Gershgorin et al. (2010a,b) for filtering multi-scale turbulent signals with hidden



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

ε

Ave

rage

RM

S e

rror

true filterRSFRSFAObs error

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.4

0.5

0.6

0.7

0.8

0.9

1

SNR−1

Ave

rage

RM

S e

rror


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.45

0.5

0.55

0.6

0.65

0.7

0.75

observation time interval (∆ t)

Ave

rage

RM

S e

rror


Figure 3. Filter accuracy: Average RMS errors as functions of ǫ for ∆t = 1 and SNR−1 = 0.5 (left panel); asfunctions of SNR−1 for ǫ = 1 and ∆t = 1 (middle panel); as functions of ∆t for ǫ = 1 and SNR−1 = 0.5 (rightpanel). In these simulations, the model parameters were a11 = a21 = a22 = −1, a12 = 1 and σx = σy =

√2.

intermittent instabilities. In particular, the SPEKF model is given by the following system of SDEs,

du

dt= −(γ + λ)u+ b+ b+ f(t) + σuWu,

db

dt= −λb

ǫb+

σb√ǫWb, (3.1)

dγ

dt= −dγ

ǫγ +

σγ√ǫWγ ,

with λ = γ−iω and λb = γb−iωb. Here, u represents a resolved mode in a turbulent signal with the nonlinearmode-interaction terms replaced by an additive complex-valued noise term b(t) and multiplicative stochasticdamping γ(t) as it is often done in turbulence modeling (Delsole 2004, Majda et al. 1999, 2001, 2003, 2008).In (3.1), Wu,Wb are independent complex valued Wiener processes and Wγ is a real valued standardWiener process. The nonlinear stochastic system in (3.1) involves the following parameters: stationary

mean damping γ and mean bias b, and two oscillation frequencies ω and ωb for the slow mode u; twodamping parameters γb and dγ for the fast variables b and γ; three noise amplitudes σu, σb, σγ > 0; and

deterministic forcing f(t) of the slow variable u. Here, two stochastic parameters b(t) and γ(t) are modeledwith Ornstein-Uhlenbeck processes, rather than treated as constant or as a Wiener processes (Friedland1969, 1982). For our purpose, we consider the temporal scales of b, γ to be of order t/ǫ, which leaves thesystem amenable to averaging for ǫ ≪ 1.

The nonlinear system in (3.1) has several attractive features as a test model. First, it has exactly solvablestatistical solutions which are non-Gaussian, allowing to study non-Gaussian prior statistics conditional onthe Gaussian posterior statistics in a Kalman filter. This filtering method is called “Stochastic Parameter-ized Extended Kalman Filter” (SPEKF). Note that it is different from the classical extended Kalman filterthat produces Gaussian prior covariance matrix through the corresponding linearized tangent model (seee.g., Anderson & Moore 1979, Kalnay 2002). Second, a recent study by Branicki et al. (2012) suggests thatthe system (3.1) can reproduce signals in various turbulent regimes such as intermittent instabilities in aturbulent energy transfer range and in a dissipative range as well as laminar dynamics. In Figure 4, weshow pathwise solutions Re[u(t)] of the system in (3.1) without deterministic forcing (f(t) = 0), unbiased

b = 0, frequencies ω = 1.78 and ωb = 1, for a non-time scale separated situation with ǫ = 1, in threeturbulent regimes considered in Branicki et al. (2012):

I. Turbulent transfer energy range. The dynamics of u(t) is dominated by frequent rapid transientinstabilities. The parameters are: γ = 1.2, γb = 0.5, dγ = 20, σu = 0.5, σb = 0.5, σγ = 20. Here, γdecays faster than u.

II. Dissipative range. The dynamics of u(t) exhibits intermittent burst of transient instabilities, followedby quiescent phases. The parameters are: γ = 0.55, γb = 0.4, dγ = 0.5, σu = 0.1, σb = 0.4, σγ = 0.5.Here, u and γ have comparable decaying time scales.



600 620 640 660 680 700 720 740 760 780 800−10

−5

0

5

10

Re[

u]

Regime I

600 620 640 660 680 700 720 740 760 780 800−10

−5

0

5

10

Re[

u]

Regime II

600 620 640 660 680 700 720 740 760 780 800−0.5

0

0.5

Re[

u]

Regime III

time

Figure 4. Pathwise solutions of SPEKF model (real part of u) in three turbulent regimes considered in Branickiet al. (2012). Notice the vertical scale of regime III is much smaller than the other two regimes.

III. Laminar mode. Here, u(t) has no transient instabilities (almost surely). The parameters are: γ = 20,γb = 0.5, dγ = 0.25, σu = 0.25, σb = 0.5, σγ = 1. Here, u decays much faster than γ.

We remark that regimes I and II exist only for sufficiently large values of ǫ. For smaller ǫ, the solutionsin these two regimes qualitatively look like a laminar mode. Second, for ǫ = 1, the following inequality issatisfied,

Ξn ≡ −nγ + ǫn2σ2

γ

2d2γ< 0, (3.2)

for n = 1 in all three regimes; this is a sufficient condition for stable mean solutions (Branicki et al. 2012).For n = 2, the condition in (3.2) is only satisfied in Regimes I and III. In Regime I, where γ decays muchfaster than u, this condition implies bounded first and second order statistics (see Appendix D of Branicki& Majda 2013).

Next, we will find a reduced stochastic prior model corresponding to the nonlinear system in (3.1). Sub-sequently, we will discuss the strong error convergence rate. Finally, we will compare the filtered solutionsfrom the proposed reduced stochastic model with the true filter with exact statistical prior solutions andvarious classical approximate filters.

(a) Reduced stochastic models

The goal of this section is to derive an approximate O(ǫ)-correction for the effective stochastic modelin (3.1). As in the linear case, the O(1) dynamics is given by the averaged dynamics, where the average istaken over the unique invariant density generated by the fast dynamics of b and γ, which is

dU

dt= −λU + b+ f(t) + σuWu. (3.3)

To recover stochastic effects of the fast dynamics neglected in (3.3), we employ invariant manifold theory.We do not pursue here singular perturbation theory as it does not yield information on the temporal



10−2 10−110−3

10−2

10−1

ε

Esu

p

averaged model

additive noise

combined noise

Figure 5. Convergence of solutions uǫ of the full model (3.1) to those of several reduced models on a time interval0 ≤ t ≤ T = 0.5. The supremum error, Esup ≡ E(sup

0≤t≤T |e(t)|2), where e(t) = uǫ(t) − U(t), is plotted as afunction of ǫ. Trajectories were averaged over 500 realizations of the Wiener processes. Linear regression of the datasets reveal slopes close to 1. Parameters are for Regime I, where γ decays faster than u, and Ξ4 < 0 for these valuesof ǫ.

evolution of the statistics as already encountered in the linear case of the previous section. We will showthat invariant manifold theory naturally produces order-ǫ correction factors that consist of both additiveand multiplicative noise.

Proceeding as in Section 2.2, we consider an O(√ǫ)-approximation of the invariant manifold of the fast

subsystem

b =√ǫσb

λb

Wb +O(ǫ),

γ =√ǫσγ

dγWγ +O(ǫ).

Inserting into the equation for the slow complex-valued variable u in (3.1) we obtain an approximatereduced model,

dU

dt= −λU + b + f(t) + σuWu +

√ǫ(σb

λb

Wb −σγ

dγUWγ

)

. (3.4)

Note that the reduced system (3.4) can be written as the averaged system (3.3) with added O(√ǫ) ad-

ditive and multiplicative noise. This model allows for a complete analytical description of the statistics.In Appendix C (see the electronic supplementary material) we present explicit expressions for the firstand second moments of (3.4). In Figure 5, we show the numerically simulated errors of solutions of theapproximate reduced models when compared to solutions of the full multi-scale system (3.1). All reducedmodels, the classical averaged model (3.3), the reduced stochastic model (3.4) and the reduced model (3.4)with only additive noise exhibit linear scaling with ǫ. Note that the absolute error is smaller though forthe higher-order models (3.4). Notice that the multiplicative noise does not contribute here much to thecloseness of solutions; however, we will see below that it will be significant for filtering turbulent signalswhen ǫ = 1. In the electronic supplementary material (Appendix D), we present a convergence proof forsolutions of the reduced model (3.4).

(b) Data assimilation for the nonlinear model

In this section, we report numerical results from filtering the SPEKF model (3.1). We assume partialobservations vm of u only,

vm = u(tm) + εom, εom ∼ N (0, ro), (3.5)



at discrete time step tm with observation time interval ∆t = tm+1 − tm. We choose ∆t = 0.5 in regimes Iand II so that it is shorter than the decorrelation times, 0.833 and 1.81, respectively. In regime III, sincethe decorrelation time is much shorter, 0.12, we choose ∆t = 0.05. The observations in (3.5) are corruptedby unbiased Gaussian noise with variance ro. We report on observation noise variances corresponding toSNR−1 = 0.5, that is, ro is 50% of the climatological variance of u. We have checked that our results arerobust when changing to smaller observational noise with SNR−1 = 0.1.

We will investigate the performance of our reduced models as forecast models in the Kalman filter forstrong, moderate, and no time scale separation. We recall that the intermittent regime II does only existfor sufficiently large values of ǫ ∼ O(1). We compare four filtering schemes, one perfect and three imperfectmodel experiments:

1. The true filter. This filtering scheme applies the classical Kalman filter formula (2.10) to update thefirst and second order non-Gaussian prior statistical solutions at each data assimilation step. Thesenon-Gaussian prior statistics are semi-analytical statistical solutions of the nonlinear SPEKF modelin (3.1) (see Gershgorin et al. 2010b, Majda & Harlim 2012, for the detailed statistics).

2. Reduced stochastic filter (RSF). This approach implements a one-dimensional Kalman filter on Gaus-sian prior statistical solutions of the averaged system (3.3).

3. Reduced stochastic filter with additive inflation (RFSA). This method implements a one-dimensionalKalman filter on Gaussian prior statistical solutions of the reduced stochastic model in (3.4) ignoringthe O(

√ǫ) multiplicative noise term. Technically, this method inflates the prior covariance of the

reduced filtering approach in 2 with an additive correction factor, ǫσ2

b

2|λb|2γ(1− e−2γ∆t), associated to

the order-√ǫ additive noise term in (3.4).

4. Reduced stochastic filter with combined, additive and multiplicative, inflations (RSFC). This filter-ing scheme applies a one-dimensional Kalman filter formula to update the non-Gaussian statisticalsolutions (see Appendix C in the electronic supplementary material) of the reduced stochastic modelin (3.4). The presence of the multiplicative noise correction term yields a statistical correction onboth the mean and covariance. This is what we refer as to “statistical inflation” in the abstract andintroduction.

In Figure 6 we show the average RMS error as a function of the time scale separation parameter ǫin regimes I and III. For sufficiently small values of ǫ all filters perform equally well, consistent with theasymptotic theory; here, the higher order correction is negligible since ǫ is small. For moderate time-scaleseparation with ǫ = 0.5 the two higher-order filters produce better skills than RSF which was based onclassical averaging theory. RSFC, which includes multiplicative noise, performed slightly better than RSFA;here, the transient instabilities have smaller magnitude compared to those with ǫ = 1 shown in Figure 4and the RMS error suggests that RSFA has relatively high filtering skill. Here, we ignore showing regimeII for smaller ǫ since the solutions look like a laminar mode.

In the remainder of this section, we consider the three turbulent regimes in a tough test bed withouttime-scale separation with ǫ = 1. In the turbulent energy transfer regime I, γ decays much faster thanu. The numerical result suggests that RSFC, including multiplicative noise, produces significantly betterfiltered estimates than all other filters with RMS errors comparable to the true filter. Note that for thisregime, the optimal inflation factor for which the smallest RMS errors are obtained is unphysical largewith a value of 7.1, and still produces RMS errors 10% larger than those of the true filter (results arenot shown). This is analogous to the situation in ensemble filters found by Mitchell & Gottwald (2012). InFigure 7, we show the posterior estimates for Regime I for observation error corresponding to SNR−1 = 0.5,compared to the true signal. Notice that the estimates from RFSC and the true filter are nearly identical;they capture almost every intermittent instability in this turbulent energy transfer range. The other twomethods, RSF and RSFA, do not capture these intermittent instabilities. In this regime, we also see animproved covariance estimate with RSFC compared to RSF and RSFA (results are not shown).

In the dissipative regime II, we consider observation error√ro = 1.2786; this choice corresponds to

SNR−1 = 0.5 of the temporal average over time t = [0, 400], ignoring the immense intermittent burst



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

ε

Ave

rage

RM

S e

rror

Regime I

true filterRSFRSFARSFCobs error

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

ε

Ave

rage

RM

S e

rror

Regime III

true filterRSFRSFARSFCobs error

Figure 6. Average RMS errors as functions of ǫ for SNR−1 = 0.5 in regimes I and III, for observation interval∆t = 0.5 for regime I and ∆t = 0.05 for regime III. We assimilated until T = 1000.

that occurs at time interval [540, 580] (see Fig. 8). This regime is an extremely difficult test bed since thevariance for the chosen parameters is unstable (that is, Ξ2 < 0), and furthermore the decaying time scalesfor u and γ are comparable. Nevertheless, RSFC performs extremely well for observation times not toolarge when compared to the growth rate of the instability, Ξ2. RSFC exhibits an RMS error of 0.70 which isclose to the true filter with an RMS error of 0.58 and much lower than the observation error,

√ro = 1.2786.

The filters without multiplicative noise, RSF and RSFA, have RMS errors of one magnitude higher with14.9 and 13.3, respectively. In Figure 8 we show the posterior estimates for Regime II. The multiplicative,i.e. amplitude dependent, nature of the noise clearly enables the filter to follow abrupt large amplitudeexcursions of the signal. This illustrates well the necessity to incorporate multiplicative noise in modellinghighly non-Gaussian intermittent signals.

In the laminar regime III, all reduced filters are doing equally well with RMS errors lying between thetrue filter and the observational noise (see Fig 6). The filtered estimate from RSFC is slightly less accuratethan those of the true filter; we suspect that this deterioration is because u decays much faster than γ inthis parameter regime, and hence the reduced stochastic systems are not dynamically consistent with thefull dynamics. We should also mention that for all these three regimes with ǫ = 1, we also simulate theclassical EKF with linear tangent model and its solutions diverge to infinity in finite time; this is due tothe practical violation of the observability condition in filtering intermittent unstable dynamics in theseturbulent regimes (see Chapter 3 of Majda & Harlim 2012, in a simpler context).

4. Summary and discussion

In this paper, we presented a study of filtering partially observed multi-scale systems with reduced stochasticmodels obtained from a systematic closure on the unresolved fast processes. In particular, we consideredstochastic reduced models derived by singular perturbation theory as well as invariant manifold theory.Here, we were not only showing convergence of solutions in the limit of large time scale separation, but wealso tackled the question of how the stochasticity induced by the unresolved scales can enhance the filteringskill, and how their diffusive behaviour can be translated into effective inflation. Our work suggests thatby incorporating results from stochastic perturbation theory one may guide inflation strategies which sofar have been mostly ad-hoc to more efficiently filter multi-scale systems in regimes where the asymptotictheory usually fails.

We focussed here on the realistic case of moderate and non-existing time scale separation in our dataassimilation experiments, and showed how judicious model error, in particular additive and multiplicativenoise, can enhance filter performance by providing sufficient dynamically adaptive statistical inflation.We systematically derived higher order additive noise in a reduced stochastic model that does not onlyimprove the path-wise estimate of solutions, but also improves the filtered estimates through a covarianceinflation for the Kalman update in the linear case and in a nonlinear example that mimics a laminar



900 920 940 960 980 1000−15

−10

−5

0

5

10

u

true filter

900 920 940 960 980 1000−15

−10

−5

0

5

10RSF

900 920 940 960 980 1000−15

−10

−5

0

5

10

u

RSFA

time900 920 940 960 980 1000

−15

−10

−5

0

5

10RSFC

time

Figure 7. Filtered posterior estimates (grey) compared to the truth (black dashes) for regime I with SNR−1 = 0.5.

mode of a turbulent signal. We also systematically derived multiplicative noise, which implies both meanand covariance corrections for the Kalman update, with a significant improvement of the filtering skill forintermittent nonlinear dynamics with sudden large amplitude excursions.

The main message of this work here is that reduced stochastic models can be viewed as dynamicallyconsistent way to introduce covariance inflation as well as mean correction, guiding the filtering process.As already found in Mitchell & Gottwald (2012) in the context of ensemble filters, the improved skill ofreduced stochastic models is not due to their ability to accurately reproduce the slow dynamics of the fullsystem, but rather by providing additional covariance inflation. It is pertinent to mention though that skillimprovement is not observed in regimes where the stochastic reduced model fails to sufficiently approximatethe statistics of the full dynamics. For example, the stochastic reduced model (3.4) produces inferior skillto the classical averaged system (3.3) for large vales of ǫ in the laminar regime in which resolved variablesdecay much faster than the unresolved variables. Hence it is an important feature of the effective stochasticinflation that it is dynamically consistent with the full dynamics and state-adaptable, contrary to adhocinflation techniques.

Finally, we should mention that although the study in this paper demonstrates the importance of astatistical inflation in the form of additive and multiplicative noise stochastic forcings for optimal filtering,it does not tell us how to choose the parameters associated with these noises (ǫ, γb, σb, dγ , σγ) for realapplications. This issue will be addressed in the future.

Acknowledgments

The authors thank Andrew Stuart for insightful discussions. GAG acknowledges support from the Aus-tralian Research Council. The research of JH was partially supported by the ONR grant N00014-11-1-0310and the ONR MURI grant N00014-12-1-0912.



500 520 540 560 580 600−300

−200

−100

0

100

200

u

true filter

500 520 540 560 580 600−300

−200

−100

0

100

200RSF

500 520 540 560 580 600−300

−200

−100

0

100

200

u

RSFA

time500 520 540 560 580 600

−300

−200

−100

0

100

200RSFC

time

Figure 8. Filtered posterior estimates (grey) compared to the truth (black dashes) for regime II withSNR−1 = 0.5.

References

Anderson, B. & Moore, J. (1979), Optimal filtering, Prentice-Hall Englewood Cliffs, NJ.

Anderson, J. L. (2001), ‘An ensemble adjustment Kalman filter for data assimilation’, Monthly WeatherReview 129(12), 2884–2903.

Anderson, J. L. (2007), ‘An adaptive covariance inflation error correction algorithm for ensemble filters’,Tellus A 59(2), 210–224.

Anderson, J. L. (2009), ‘Spatially and temporally varying adaptive covariance inflation for ensemble filters’,Tellus A 61(2), 72–83.

Anderson, J. L. & Anderson, S. L. (1999), ‘A Monte Carlo implementation of the nonlinear filtering problemto produce ensemble assimilations and forecasts’, Monthly Weather Review 127(12), 2741–2758.

Berglund, N. & Gentz, B. (2005), Noise-Induced Phenomena in Slow-Fast Dynamical Systems, Springer.

Boxler, P. (1989), ‘A stochastic version of the centre manifold theorem’, Probab. Th. Rel. Fields 83, 509–545.

Branicki, M., Gershgorin, B. & Majda, A. (2012), ‘Filtering skill for turbulent signals for a suite of nonlinearand linear extended kalman filters’, Journal of Computational Physics 231(4), 1462 – 1498.

Branicki, M. & Majda, A. (2013), ‘Fundamental limitations of polynomial chaos for uncertainty quantifi-cation in systems with intermittent instabilities’, Comm. Math. Sci. 11(1), 55–103.

Buizza, R., Houtekamer, P. L., Pellerin, G., Toth, Z., Zhu, Y. & Wei, M. (2005), ‘A comparison of theECMWF, MSC, and NCEP global ensemble prediction systems’, Monthly Weather Review 133(5), 1076–1097.



Castronovo, E., Harlim, J. & Majda, A. (2008), ‘Mathematical test criteria for filtering complex systems:plentiful observations’, Journal of Computational Physics 227(7), 3678–3714.

Delsole, T. (2004), ‘Stochastic models of quasigeostrophic turbulence’, Surveys in Geophysics 25, 107–149.

Ehrendorfer, M. (2007), ‘A review of issues in ensemble-based Kalman filtering’, Meteorologische Zeitschrift16(6), 795–818.

Evensen, G. (1994), ‘Sequential data assimilation with a nonlinear quasi-geostrophic model using MonteCarlo methods to forecast error statistics’, Journal of Geophysical Research 99(C5), 10143–10162.

Evensen, G. (2006), Data Assimilation: The Ensemble Kalman Filter, Springer, New York.

Fenichel, N. (1979), ‘Geometric singular perturbation theory for ordinary differential equations’, Journalof Differential Equations 31(1), 53–98.

Friedland, B. (1969), ‘Treatment of bias in recursive filtering’, IEEE Trans. Automat. Contr. AC-14, 359–367.

Friedland, B. (1982), ‘Estimating sudden changes of biases in linear dynamical systems’, IEEE Trans.Automat. Contr. AC-27, 237–240.

Gershgorin, B., Harlim, J. & Majda, A. (2010a), ‘Improving filtering and prediction of spatially ex-tended turbulent systems with model errors through stochastic parameter estimation’, J. Comput. Phys.229(1), 32–57.

Gershgorin, B., Harlim, J. & Majda, A. (2010b), ‘Test models for improving filtering with model errorsthrough stochastic parameter estimation’, J. Comput. Phys. 229(1), 1–31.

Givon, D., Kupferman, R. & Stuart, A. (2004), ‘Extracting macroscopic dynamics: Model problems andalgorithms’, Nonlinearity 17(6), R55–127.

Hamill, T. M. & Whitaker, J. S. (2011), ‘What constrains spread growth in forecasts initialized fromensemble Kalman filters’, Monthly Weather Review 139, 117–131.

Harlim, J. (2011), ‘Numerical strategies for filtering partially observed stiff stochastic differential equations’,Journal of Computational Physics 230(3), 744–762.

Harlim, J. & Majda, A. (2013), ‘Test models for filtering with superparameterization’, Multiscale Model.Simul. 11(1), 282 – 307.

Houtekamer, P. L. & Mitchell, H. L. (1998), ‘Data assimilation using an ensemble Kalman filter technique’,Monthly Weather Review 126(3), 796–811.

Houtekamer, P. L., Mitchell, H. L. & Deng, X. (2009), ‘Model error representation in an operationalensemble Kalman filter’, Monthly Weather Review 137(7), 2126–2143.

Imkeller, P., Namachchivaya, N. S., Perkowski, N. & Yeong, H. C. (2012), ‘A homogenization approach tomultiscale filtering’, Procedia IUTAM 5(0), 34 – 45. IUTAM Symposium on 50 Years of Chaos: Appliedand Theoretical.

Kalnay, E. (2002), Atmospheric Modeling, Data Assimilation and Predictability, Cambridge UniversityPress, Cambridge.

Kang, E. & Harlim, J. (2012), ‘Filtering partially observed multiscale systems with heterogeneous multiscalemethods-based reduced climate models’, Monthly Weather Review 140(3), 860–873.

Khasminskii, R. (1968), ‘On averaging principle for it stochastic differential equations’, Kybernetika,Chekhoslovakia (in Russian) 4(3), 260–279.



Kurtz, T. G. (1973), ‘A limit theorem for perturbed operator semigroups with applications to randomevolutions’, Journal of Functional Analysis 12(1), 55–67.

Law, K. & Stuart, A. (2012), ‘Evaluating data assimilation algorithms’, Monthly Weather Review140, 3757–3782.

Li, H., Kalnay, E. & Miyoshi, T. (2009), ‘Simultaneous estimation of covariance inflation and observa-tion errors within an ensemble Kalman filter’, Quarterly Journal of the Royal Meteorological Society135(639), 523–533.

Majda, A. J., Franzke, C. & Khouider, B. (2008), ‘An applied mathematics perspective on stochasticmodelling for climate’, Philosophical Transactions of the Royal Society A: Mathematical, Physical andEngineering Sciences 366(1875), 2427–2453.

Majda, A. J. & Harlim, J. (2012), Filtering Complex Turbulent Systems, Cambridge University Press,Cambridge.

Majda, A. J., Timofeyev, I. & Vanden Eijnden, E. (1999), ‘Models for stochastic climate prediction’,Proceedings of the National Academy of Sciences 96(26), 14687–14691.

Majda, A. J., Timofeyev, I. & Vanden Eijnden, E. (2001), ‘A mathematical framework for stochastic climatemodels’, Communications on Pure and Applied Mathematics 54(8), 891–974.

Majda, A. J., Timofeyev, I. & Vanden-Eijnden, E. (2003), ‘Systematic strategies for stochastic mode re-duction in climate’, Journal of the Atmospheric Sciences 60(14), 1705–1722.

Mitchell, L. & Gottwald, G. A. (2012), ‘Data assimilation in slow-fast systems using homogenized climatemodels’, Journal of the Atmospheric Sciences 69(4), 1359–1377.

Miyoshi, T. (2011), ‘The Gaussian approach to adaptive covariance inflation and its implementation withthe Local Eensemble Transform Kalman Filter’, Monthly Weather Review 139, 1519–1535.

Ng, G.-H. C., McLaughlin, D., Entekhabi, D. & Ahanin, A. (2011), ‘The role of model dynamics in ensembleKalman filter performance for chaotic systems’, Tellus A 63, 958–977.

Palmer, T. N. (2001), ‘A nonlinear dynamical perspective on model error: A proposal for non-localstochastic-dynamic parametrization in weather and climate prediction models’, Quarterly Journal ofthe Royal Meteorological Society 127(572), 279–304.

Papanicolaou, G. (1976), ‘Some probabilistic problems and methods in singular perturbations’, RockyMountain J. Math 6, 653–673.

Pavliotis, G. A. & Stuart, A. M. (2008), Multiscale Methods: Averaging and Homogenization, Springer,New York.

Pavliotis, G. & Stuart, A. (2007), ‘Parameter estimation for multiscale diffusions’, Journal of StatisticalPhysics 127(4), 741–781.

Petrie, R. E. & Dance, S. L. (2010), ‘Ensemble-based data assimilation and the localisation problem’,Weather 65(3), 65–69.

Roberts, A. J. (2008), ‘Normal form transforms separate slow and fast modes in stochastic dynamicalsystems’, Physica A 387, 12–38.

Sandu, A., Constantinescu, E. M., Carmichael, G. R., Chai, T., Seinfeld, J. H. & Daescu, D. (2007), ‘Lo-calized ensemble kalman dynamics data assimilation for atmospheric chemistry’, Lecture Notes Comput.Sci. 4487, 1018–1490.

Zhang, F. (2011), Parameter estimation and model fitting of stochastic processes, Phd thesis, Universityof Warwick.


Appendices for article: The role of additive and

multiplicative noise in filtering complex dynamical

systems

By Georg A. Gottwald1 and John Harlim2

1 School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia,[email protected]

2 Department of Mathematics, North Carolina State University, BOX 8205, Raleigh, NC 27695, U.S.A.,[email protected]

Appendix A. Singular perturbation expansion for linear problem

In this Appendix, we apply singular perturbation theory to obtain one order higher than the usual averaginglimit. We recall the linear model (2.1)–(2.2)

dx = (a11x+ a12y) dt+ σxdWx, (A 1)

dy =1

ǫ(a21x+ a22y) dt+

σy√ǫdWy, (A 2)

for a slow variable x ∈ R and fast variable y ∈ R. Here, Wx,Wy are independent Wiener processes andthe parameter ǫ characterizes the time scale gap. We assume throughout that σx, σy 6= 0 and that theeigenvalues of the matrix

A =

(

a11 a121ǫa21

1ǫa22

)

are strictly negative, to assure the existence of a unique invariant joint density. Furthermore we requirea = a11 − a12a

−122 a21 < 0 to assure that the leading order slow dynamics supports an invariant density

(cf.(A 11)). The probability density function of the system (A 1)-(A 2) satisfies a Fokker-Planck equation,

∂

∂tρ(x, y, t) =

1

ǫL⋆0ρ(x, y, t) + L⋆

1ρ(x, y, t), (A 3)

where

L⋆0ϕ = − ∂

∂y((a21x+ a22y)ϕ) +

1

2σ2y

∂2

∂y2ϕ,

L⋆1ϕ = − ∂

∂x((a11x+ a12y)ϕ) +

1

2σ2x

∂2

∂x2ϕ. (A 4)

The probability density function is expanded according to

ρ(x, y, t) = ρ0 + ǫρ1 + ǫ2ρ2 + · · · . (A 5)

Substituting the series (A 5) into the Fokker-Planck equation (A 3), and collecting orders of ǫ, we obtainat lowest order, O(1/ǫ),

L⋆0ρ0 = 0. (A 6)

Since the fast dynamics is ergodic, there exists a unique solution to (A 6),

ρ0(x, y, t) = ρ∞(y;x)ρ(x, t),

Article submitted to Royal Society TEX Paper


with ρ∞(y;x) given by,

ρ∞(y;x) ∝ exp(a22y

2

σ2y

)

, y ≡ y +a21a22

x, (A 7)

At the next order, O(1), we obtain

L⋆0ρ1 =

∂

∂tρ0 − L⋆

1ρ0. (A 8)

To assure boundedness of ρ1 (and thereby of the asymptotic expansion (A 5)) a solvability condition has tobe satisfied prescribed by the Fredholm alternative. Equation (A 8) is solvable only if the right-hand-sideis in the space orthogonal to the (one-dimensional) null space of the adjoint L0 of L⋆

0. The adjoint to theFokker-Planck equation is called the backward-Kolmogorov equation and propagates expectation values;hence ergodicity of the fast dynamics implies that the infinitesimal generator L0 has constants as the onlykernel modes. The solvability condition for (A 8) therefore reads as

∫

dy( ∂

∂tρ0 − L⋆

1ρ0

)

= 0, (A 9)

which yields the evolution equation for ρ

∂

∂tρ = − ∂

∂x(axρ) +

1

2σ2x

∂2

∂x2ρ . (A 10)

The associated averaged Langevin equation

dX = aXdt+ σxdWx, (A 11)

supports an invariant measure provided a < 0. The solution of the reduced Fokker-Planck equation (A 10)is

ρ(x, t) ∝ exp

(

− x2

2σ(t)2

)

, (A 12)

where

σ2(t) = −σ2x

2a(1− e2at).

To capture diffusive effects of the fast variable we perform here the perturbation theory one orderhigher. The solvability condition (A 9) assures that the O(1) equation (A 8) can be solved for ρ1. In orderfor ρ0 + ǫρ1 to be a density we require the necessary condition on ρ1

∫

dx dy ρ1 = 0. (A 13)

After lengthy algebra, we find

ρ1(x, y, t) =(

([2a11a21σ2ya22

− 2a12a221

σ2ya

222

]

xy +[a12a21σ2ya22

− a221σ2x

a22σ4y

]

(y2 +σ2y

2a22))

ρ(x, t)

+[a12a22

− 2a21σ2x

a22σ2y

]

y∂xρ(x, t) +R(x, t))

ρ∞(y;x), (A 14)

where R(x, t) lies in the kernel of L⋆0 and will be chosen to satisfy the condition in (A 13), which reduces

to.∫

dxR(x, t) = 0. (A 15)


Appendices for article: The role of additive and multiplicative noise in filtering complex dynamical systems 3

The function R can be determined by requiring that the solution ρ1 satisfies the solvability condition ofthe next order, O(ǫ),

L⋆0ρ2 =

∂

∂tρ1 − L⋆

1ρ1, (A 16)

that is,

∫

dy( ∂

∂tρ1 − L⋆

1ρ1

)

= 0. (A 17)

Substituting (A 14) to (A 17), we obtain

∂

∂tR+

∂

∂x(axR)− 1

2σ2x

∂2

∂x2R =

a12a21a222

a∂x(xρ) +(a212σ

2y

2a222− a12a21σ

2x

a222

)

∂xxρ, (A 18)

the solutions of which can be found by a self-similarity ansatz, R(x, t) = σ(t)−1f(ξ), where ξ ≡ xσ(t)−1.After some algebra, we find a solution that satisfies the condition in (A 15),

R(x, t) =a12

σ2xa

222

(−a21σ2x +

a12σ2y

2− a21aσ

2(t))(x2

σ2(t)− 1)ρ(x, t). (A 19)

It is not too difficult to check that (A 14) with (A 19) satisfies the constraint in (A 13). Furthermore, we haveEρ1

(x) =∫

dx dy x ρ1(x, y) = 0, which implies that the O(ǫ)-correction does not contribute a correction tothe mean of x when compared to the averaged system (A 11).

Appendix B. Convergence of solutions for the linear system

We formulate the following convergence result.

Theorem 0.1. Consider the linear multi-scale system (A 1)-(A 2). Assume that the matrix A has negativeeigenvalues and that a = a11 − a12a

−122 a21 < 0. Let xǫ be the solution of (A 1)-(A 2) and X be the solution

of the reduced equation,

dX = aX dt+ σxdWx −√ǫσy

a12a22

dWy, (B 1)

corresponding to the same realizations Wx,Wy and the same initial condition xǫ(0) = X(0). Then the error

e(t) = xǫ(t)− X(t) is bounded for finite time T by

E

(

sup0≤t≤T

|e(t)|2)

≤ cǫ2. (B 2)

Proof. We follow the general line of proof as outlined in Pavliotis & Stuart (2008), and extend the results ofZhang (2011) to the next order. Consider the slow drift term f(x, y, t) = a11x+a12y and the averaged slowvector field F (x) = ax where we recall a = a11 − a12a

−122 a21 < 0. Introducing the infinitesimal generators

L0 and L1 as the formal L2-adjoints of the operators (A 4), consider the following Poisson equation,

L0φ(x, y, t) = f(x, y)− F (x) = a11x+ a12y − ax, (B 3)

for smooth functions φ satisfying 〈φ〉ρ∞≡

∫

φ(x, y, t)ρ∞(y;x) dy = 0. The existence of solutions to thePoisson equation is assured by a Fredholm alternative since 〈f − F 〉ρ∞

= 0. Using Ito’s formula we write

dφ

dt=

1

ǫL0φ+ L1φ+ σx∂xφWx +

σy√ǫ∂yφWy . (B 4)



Combining (B3) and (B 4), we deduce that

f(x, y) = F (x) −√ǫσy∂yφWy + ǫ

(dφ

dt− L1φ− σx∂xφWx

)

(B 5)

and substituting this equality into the slow equation (A 1), we obtain

dx = F (x) dt+ σxdWx −√ǫσydWy + ǫ

(dφ

dt− L1φ− σx∂xφWx

)

dt. (B 6)

In the case of our linear system (A1)-(A 2) we find φ(y) = a12

a22

y as a solution of the Poisson equation (B 3)and find

dxǫ = axǫ dt+ σxdWx −√ǫσy

a12a22

dWy + ǫdφ

dtdt. (B 7)

Note that this equation is simply a rewrite of (A 1) where we replace the fast variable y using (A 2). Wehave

∫ t

0

dφ

dsds = φ(t)− φ(0) = −a12

a22(yǫ(t)− yǫ(0)).

Integrating (B 7) and (B1), we obtain the solution of the full system

xǫ(t) = xǫ(0) + a

∫ t

0

xǫ(s)ds+ σx

∫ t

0

dWx −√ǫσy

a12a22

∫ t

0

dWy − ǫa12a22

(yǫ(t)− yǫ(0)) (B 8)

and of the reduced system

X(t) = X(0) + a

∫ t

0

X(s)ds+ σx

∫ t

0

dWx −√ǫσy

a12a22

∫ t

0

dWy , (B 9)

respectively. Defining the error e(t) = xǫ(t)− X(t) with e(0) = 0, we have

e(t) = a

∫ t

0

e(s)ds− ǫa12a22

(yǫ(t)− yǫ(0)),

such that upon using the triangle inequality we obtain the bound

E

(

sup0≤t≤T

|e(t)|2)

≤ 2(

∫ T

0

E

(

sup0≤t≤T

|e(s)|2)

ds+ ǫ2E(

sup0≤t≤T

|yǫ(t)− yǫ(0)|2))

. (B 10)

The upper bound in (B2) is obtained upon using the Gronwall lemma and by applying the result in Zhang(2011),

E

(

sup0≤t≤T

|yǫ(t)|2)

≤ O(log(T ǫ−1)). (B 11)

We remark that extensions to the multi-dimensional case can be readily made provided there are suitablenon-degeneracy conditions for the noise.



Appendix C. First and second order statistics of the reduced model with

additive and multiplicative noises

In this Appendix we compute the first and second order statistics of the reduced stochastic model in (D 1)with initial condition U(t0) = U0. Define

J(s, t) =

∫ t

s

λds′ +√ǫσγ

dγdWγ(s

′) = λ(t− s) +√ǫσγ

dγ(Wγ(t)−Wγ(s)) ≡ λ(t− s) + JW (s, t),

where

JW (s, t) =√ǫσγ

dγ(Wγ(t)−Wγ(s))

〈JW (s, t)〉 = 0

V ar(JW (s, t)) = 〈JW (s, t)2〉 = ǫσ2γ

d2γ(t− s).

The explicit solution for (D 1) can be written as follows,

U(t) = e−J(t0,t)U0 +

∫ t

t0

(b+ f(s))e−J(s,t)ds+ σu

∫ t

t0

e−J(s,t)dWu(s) +√ǫσb

λb

∫ t

t0

e−J(s,t)dWb(s)

= A+B + C +D.

Therefore, we can compute the mean, the variance and the covariance

〈U(t)〉 = 〈A〉 + 〈B〉, (C 1)

V ar(U(t)) = 〈|U(t)|2〉 − |〈U(t)〉|2

Cov(U(t), U(t)∗) = 〈U(t)2〉 − 〈U(t)〉2,

where

〈|U(t)|2〉 = 〈|A|2〉+ 〈|B|2〉+ 〈|C|2〉+ 〈|D|2〉+ 2Re[〈A∗B〉] (C 2)

〈U(t)2〉 = 〈A2〉+ 〈B2〉+ 2〈AB〉. (C 3)

Now we are going to compute each term in the right hand side of (C 1), (C 2), and (C 3). In all of thesecomputations, we will use the fact that

〈zebx〉 = 〈z〉eb〈x〉+ b2

2V ar(x)

〈zwebx〉 =(

〈z〉〈w〉 + cov(z, w∗))

eb〈x〉+b2

2V ar(x),

for any constant b, assuming that the real valued Gaussian random variable x is independent of bothcomplex valued Gaussian random variable z and w.

We find that

〈A〉 = e−λ(t−t0)〈U0e−JW (t0,t)〉 = e−λ(t−t0)〈U0〉〈e−JW (t0,t)〉

= 〈U0〉e−λ(t−t0)+1

2V ar(JW (s,t)) = 〈U0〉e

(

−λ+ ǫ2

σ2γ

d2γ

)

(t−t0)

〈B〉 =∫ t

t0

(b + f(s))e−λ(t−t0)〈e−JW (s,t)〉ds =∫ t

t0

(b+ f(s))e

(

−λ+ ǫ2

σ2γ

d2γ

)

(t−s)ds

〈|A|2〉 = e−2γ(t−t0)〈|U0|2e−2JW (s,t)〉 =(

|〈U0〉|2 + V ar(U0))

e2(

−γ+ǫσ2γ

d2γ

)

(t−t0)



〈|B|2〉 =∫ t

t0

∫ t

t0

⟨(

(b + f(s))e−J(s,t))(

(b + f(r))e−J(r,t))∗⟩

ds dr,

=

∫ t

t0

∫ t

t0

(

|b|2 + bf(r)∗ + b∗f(s) + f(s)f(r)∗)

e−2γt+λs+λ∗r〈e−JW (s,t)−JW (r,t)〉 ds dr,

=

∫ t

t0

∫ t

t0

(

|b|2 + bf(r)∗ + b∗f(s) + f(s)f(r)∗)

e−2γt+λs+λ∗reǫ2

σ2γ

d2γ

[

2t−r−s+2min(t−r,t−s)]

ds dr,

Using the Ito lemma, we deduce:

〈|C|2〉 = σ2u

∫ t

t0

e−2γ(t−s)〈e−2JW (s,t)〉ds = − σ2u

2(−γ + ǫσ2γ

d2γ)

(

1− e2(−γ+ǫ

σ2γ

d2γ)(t−t0)

)

〈|D|2〉 = − ǫσ2b

2|λb|2(−γ + ǫσ2γ

d2γ)

(

1− e2(−γ+ǫ

σ2γ

d2γ)(t−t0)

)

〈A∗B〉 = e−λ∗(t−t0)〈U∗0 〉

∫ t

t0

(b+ f(s))e−λ(t−t0)〈e−JW (t0,t)−JW (s,t)〉ds, (C 4)

where

〈e−JW (t0,t)−JW (s,t)〉 = e1

2V ar(J(t0,t))+

1

2V ar(J(s,t))+Cov(J(t0,t)J(s,t))

= eǫ2

σ2γ

d2γ

[

(t−t0)+(t−s)+2min(t−t0,t−s)]

= eǫ2

σ2γ

d2γ

[

(t−t0)+3(t−s)]

, (C 5)

since t0 ≤ s ≤ t. Substituting (C 5) into (C 4), we obtain

〈A∗B〉 = e

(

−2γ+ ǫ2

σ2γ

d2γ

)

(t−t0)〈U∗0 〉[

∫ t

t0

(b+ f(s))e3ǫ2

σ2γ

d2γ(t−s)

ds]

. (C 6)

Following the same procedure, we obtain

〈A2〉 = e−2λ(t−t0)〈U20 e

−2JW (s,t)〉 =(

〈U0〉2 + Cov(U0, U∗0 ))

e2(

−λ+ǫσ2γ

d2γ

)

(t−t0)

〈AB〉 = e−λ(t−t0)〈U0〉∫ t

t0

(b+ f(s))e−λ(t−t0)〈e−JW (t0,t)−JW (s,t)〉ds

= e

(

−2λ+ ǫ2

σ2γ

d2γ

)

(t−t0)〈U0〉[

∫ t

t0

(b + f(s))e3ǫ2

σ2γ

d2γ(t−s)

ds]

〈B2〉 =∫ t

t0

∫ t

t0

⟨(

(b+ f(s))e−J(s,t))(

(b + f(r))e−J(r,t))⟩

ds dr

=

∫ t

t0

∫ t

t0

(

b2 + bf(r) + bf(s) + f(s)f(r))

e−2λt+λs+λr〈e−JW (s,t)−JW (r,t)〉 ds dr

=

∫ t

t0

∫ t

t0

(

b2 + bf(r) + bf(s) + f(s)f(r))

e−2γt+λs+λreǫ2

σ2γ

d2γ

[

2t−r−s+2min(t−r,t−s)]

ds dr.

Appendix D. Convergence of solutions for the nonlinear system

Theorem 0.2. Consider the SPEKF model in (3.1). Assume that f(t) is bounded, Ξn ≡ −nγ+ ǫn2σ2

γ

2d2γ

< 0

for 1 ≤ n ≤ 4, and γ has sufficient decay of correlations. Let uǫ be a solution of (3.1) and U be a solutionof

dU

dt= −(λ+

√ǫσγ

dγWγ)U + b+ f(t) + σuWu +

√ǫσb

λb

Wb, (D 1)



where λ = γ − iω, corresponding to the same realizations Wu,Wb,Wγ and the same initial condition

uǫ(0) = U(0). Then there is a constant C(ǫ, T ) such that the error e(t) = uǫ(t)−U(t) is bounded for finitetime T by

E

(

sup0≤t≤T

|e(t)|2)

≤ cǫ2(

C(ǫ, T ) + Cσ2uσ

2γ

d3γT log(1 + ǫ−1dγT )

)

ec(1+ǫ

σ2γ

d2γ)T. (D 2)

Proof. Following the proof of Theorem 0.1, we solve the following Poisson problem for φ,

L0φ = F1(u, b, γ)− F1(u) = −γu+ b, (D 3)∫ ∫

φ(u, b, γ)p∞(b, γ) db dγ = 0, (D 4)

where F1(u) = −(γ+ λ)u+ b+ b+f(t) and F1(u) = −λu+ b+f(t) are the drift terms of the slow dynamicsin (3.1) and of the reduced model in (D 1), respectively. One can verify that

φ(u, b, γ) =γu

dγ− b

λb

(D 5)

is a solution for the Poisson problem in (D3)-(D 4). Applying the Ito-formula we obtain

dφ

dt=

1

ǫL0φ+ L1φ+

γ

dγσuWu − σb√

ǫλb

Wb + uσγ√ǫdγ

Wγ ,

and subsequently, we can rewrite (D 3) as follows

F1(u, b, γ) = F1(u) + L0φ = F1(u) + ǫ(dφ

dt− L1φ

)

− ǫγ

dγσuWu +

√ǫσb

λb

Wb −√ǫu

σγ

dγWγ .

Substituting into the slow equation in (3.1), we have

duǫ

dt= F1(u

ǫ) + σuWu +√ǫ(σb

λb

Wb − uǫ σγ

dγWγ

)

+ ǫ(

(dφǫ

dt− L1φ

ǫ)

− γ

dγσuWu

)

. (D 6)

The solutions of the full model (D 6) is given by

uǫ(t) = uǫ(0) +

∫ t

0

(−λuǫ(s) + b+ f(s)) ds+ σu

∫ t

0

dWu(s) +√ǫσb

λb

∫ t

0

dWb(s)−√ǫσγ

dγ

∫ t

0

uǫ(s)dWγ(s)

+ ǫ(φǫ(t)− φǫ(0))− ǫ

∫ t

0

F1(uǫ(s))

γǫ(s)

dγds− ǫ

σu

dγ

∫ t

0

γ(s)ǫdWu(s),

and the solution of the approximate model (D 1) is given by

U(t) = U(0) +

∫ t

0

(−λU(s) + b+ f(s)) ds+ σu

∫ t

0

dWu(s) +√ǫσb

λb

∫ t

0

dWb(s)−√ǫσγ

dγ

∫ t

0

U(s)dWγ(s).

Defining e(t) ≡ uǫ(t)− U(t) and e(0) = 0, we obtain upon using the triangle inequality

E

(

sup0≤t≤T

|e(t)|2)

≤ 4(

∫ T

0

E|e(s)|2 ds+ ǫσ2γ

d2γE

∣

∣

∣

∫ T

0

∫ T

0

e(s)e(s′)dWγ(s)dWγ(s′)∣

∣

∣

+ ǫ2θ(T ) + ǫ2σ2u

d2γE

∣

∣

∣

∫ T

0

∫ T

0

γ(s)γ(s′)dWu(s)dWu(s′)∣

∣

∣

)

, (D 7)

where

θ(T ) ≡ E

(

sup0≤t≤T

∣

∣

∣φ(t) − φ(0)−

∫ t

0

F1(s)γ(s)

dγds∣

∣

∣

2)

. (D 8)



By Ito isometry, (D 7) becomes

E

(

sup0≤t≤T

|e(t)|2)

≤ c(

(1 + ǫσ2γ

d2γ)

∫ T

0

E|e(s)|2 ds+ ǫ2θ(T ) + ǫ2σ2u

d2γ

∫ T

0

E|γ(s)|2ds)

≤ c(

(1 + ǫσ2γ

d2γ)

∫ T

0

E|e(s)|2 ds+ ǫ2θ(T ) + Cǫ2σ2uσ

2γ

d3γT log(1 + ǫ−1dγT )

)

)

,

where we use Lemma 3.5 in Zhang (2011),

E

(

sup0≤t≤T

|γ(t)|2)

≤ Cσ2γ

dγlog(1 + ǫ−1dγT ). (D 9)

Provided γ(t) decorrelates sufficiently fast and additionally Ξn ≡ −nγ + ǫn2σ2γ/(2d

2γ) < 0 is satisfied,

moments up to order-n are bounded (see Appendix D in Branicki & Majda 2013). Since θ in (D 8) is afunction of as high as fourth order moments, our assumptions guarantee that there is a constant C(ǫ, T )such that θ(T ) ≤ C(ǫ, T ). Subsequently, the main result in (D 2) is obtained upon employing the Gronwalllemma.

The upper bound in (D 2) is not tight since we don’t have an explicit expression for C as a function ofǫ. Figure 5 suggests that (at least for the parameters chosen therein) that C ≈ O(ǫ−1).

References

Branicki, M. & Majda, A. (2013), ‘Fundamental limitations of polynomial chaos for uncertainty quantifi-cation in systems with intermittent instabilities’, Comm. Math. Sci. 11(1), 55–103.

Pavliotis, G. A. & Stuart, A. M. (2008), Multiscale Methods: Averaging and Homogenization, Springer,New York.

Zhang, F. (2011), Parameter estimation and model fitting of stochastic processes, Phd thesis, Universityof Warwick.


the role of additive and multiplicative noise in ﬁltering ... · the role of additive and...

Documents