sequential monte carlo methods for a dynamical model of ... · sequential monte carlo (smc)...

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Swiss Federal Institute of Technology Zurich Seminar forStatistics

Department of Mathematics

Master Thesis Summer 2012

Sylvain Robert

Sequential Monte Carlo methods

for a dynamical model of stock prices

Submission Date: August 15th 2012

Co-Adviser Prof. Didier SornetteAdviser: Prof. Hans Rudolph Kunsch

Abstract

Stock markets often exhibit behaviours that are far from equilibrium, such as bubbles andcrashes. The model developed in Yukalov et al. (2009) aims at describing the dynamic ofstock prices, and notably the way they deviate from their fundamental value.

The present work was interested in estimating the parameters of the model and in filteringthe underlying mispricing process. Various Sequential Monte Carlo methods were appliedto the problem at hand. In particular, a fully adapted Particle Filter was derived andshowed the best performances.

While the filtering was well handled by the different methods, the estimation of theparameters was much more difficult. Nevertheless, it was possible to identify the markettype, which qualitatively describes the dynamic of a stock.

The methods were first tested on simulated data before having been applied to the DowJones Industrial Average. The latter application led to very interesting results. Indeed,the estimated model provided insight about the underlying dynamic, and the filtering ofthe mispricing process allowed to shed a new light on some important financial events ofthe last 40 years.

iii

iv CONTENTS

Contents

Notation vii

1 Introduction 1

2 Dynamical model of stock prices 3

2.1 Nonlinear dynamical model in continuous time . . . . . . . . . . . . . . . . 3

2.1.1 Equilibrium market? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 Mispricing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.3 Interpretation of the parameters . . . . . . . . . . . . . . . . . . . . 5

2.1.4 Market types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.5 Fundamental price . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 State-space representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 General state-space model . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Discretization of the state variables . . . . . . . . . . . . . . . . . . . 7

2.2.3 Observation process . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.4 Formulation of the model . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Numerical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Particle Filter 13

3.1 Bayesian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Particle Filter algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 Sequential Importance Sampling . . . . . . . . . . . . . . . . . . . . 15

3.2.2 Sequential Importance Resampling . . . . . . . . . . . . . . . . . . . 16

3.2.3 General Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.4 Sample depletion and resampling techniques . . . . . . . . . . . . . . 19

3.3 Advanced Particle Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Auxiliary Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.2 Fully Adapted Particle Filter . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Illustration of the Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Parameter estimation 27

4.1 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Monte Carlo Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3 State Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Fixed parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.2 Artificial dynamic of parameters . . . . . . . . . . . . . . . . . . . . 32

4.3.3 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 Why is the problem so hard? . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Simulations 39

5.1 Market types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Filtering of the state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Augmented filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Extensive simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4.1 Multiple realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4.2 Consistency of the estimator . . . . . . . . . . . . . . . . . . . . . . 46

CONTENTS v

5.5 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.6 Model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.7 Conclusion of simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Application to financial data 536.1 Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.3 Model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.4 Deterministic dynamical model . . . . . . . . . . . . . . . . . . . . . . . . . 566.5 Filtering of the state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.6 Interpretation of the mispricing dynamic . . . . . . . . . . . . . . . . . . . . 58

7 Conclusion 61

Bibliography 63

A Implementation 65A.1 Simulate the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65A.2 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A.3 Normal SIR Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67A.4 Fully adapted Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 69A.5 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

vi LIST OF TABLES

List of Figures

2.1 Graphical representation of a general state-space model . . . . . . . . . . . 7

2.2 Graphical representation of the price dynamic . . . . . . . . . . . . . . . . . 9

2.3 One example of simulated stock prices . . . . . . . . . . . . . . . . . . . . . 11

3.1 An animated example of the Particle Filter . . . . . . . . . . . . . . . . . . 25

4.1 Why is the problem so hard? . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Simulated Maximum Likelihood and filtering . . . . . . . . . . . . . . . . . 38

5.1 Filtering of the state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2 Density plot of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3 Example of filtering of the parameters . . . . . . . . . . . . . . . . . . . . . 43

5.4 Filtering of the parameters, a second example . . . . . . . . . . . . . . . . . 44

5.5 Boxplot of estimated parameters for market type 1 . . . . . . . . . . . . . . 47

5.6 Model checking for simulated data . . . . . . . . . . . . . . . . . . . . . . . 49

6.1 Dow Jones Industrial Average . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.2 Estimated parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3 Model checking for the DJIA data . . . . . . . . . . . . . . . . . . . . . . . 56

6.4 Phase portrait of the estimated dynamic system . . . . . . . . . . . . . . . 57

6.5 Filtered state of the DJIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.6 Mispricing of the DJIA and financial events . . . . . . . . . . . . . . . . . . 60

List of Tables

5.1 Parameter value for different market types . . . . . . . . . . . . . . . . . . . 40

5.2 Identifiability of market types . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3 RMSE for different market types and methods. The last 3 columns show therecovering rate of the sign of respectively α, A and of both, which determinesthe general type of market. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4 Standard deviation of estimates over 50 repetitions for different market typesand methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1 Estimated parameters of the DJIA dynamic. . . . . . . . . . . . . . . . . . 55

Notation

• Continuous description

p stock price

q log(p)

pf fundamental price

qf log(pf )

x the mispricing (q − qf )

y the drift of x

• Discrete description

∆t time increment between two observations

Pt the discrete version of p at time t

Qt the discrete version of q at time t

Xt the discrete version of x at time t

Yt the discrete version of y at time t

Rt the log-return at time t (Qt −Qt−1)

vii

viii Notation

Chapter 1

Introduction

Every agent is rational and has access to full information, there are no arbitrage oppor-tunities and therefore the prices reflect the fundamental value of stocks and markets areat equilibrium. That is the fairy tale told by neoclassical economists, but reality outthere is much more complex. Agents act irrationally, imitate each other and follow trends.Information is not evenly spread and the fundamental value of stocks is never known. Realfinancial time series are characterized by long range correlations, fat-tailed distribution ofreturns, asymmetries, heteroscedasticity, etc. Markets exhibit periods of over-enthusiasm:there are times when stocks are priced many times their actual earning, as the Internetcompanies back in the time when they were thought to sweep away the Old Economy.These are times of bubbles, which are not meant to last forever. At some point the soar ofprices is caught up by reality and the bubble has to deflate. This process can be slow andsteady, but also brutal as during financial crashes. That is when fortunes are broken, banksgo bankrupt and times of recession begin, such as ours since the 2007-2009 financial crisis.

The model introduced in Yukalov et al. (2009) and presented in chapter 2 aims at explain-ing the dynamic of stock prices, and more specifically the way they deviate from theirfundamental value. The main modelling step is to represent the price as a combinationof the fundamental value of a stock and an unobserved, latent mispricing factor. Thismispricing follows a second-order, nonlinear dynamic, represented by a diffusion processembodying expectations of agents and various trading strategies, as well as properties ofthe markets. It is somehow possible to differentiate at a macroscopic level the effects ofindividual and collective behaviours. At the same time, the levels of regulation and ofuncertainty about the fundamental value also have an influence on the mispricing dynamic.

One of the main strengths of the model is that it accounts in a same framework for marketswith a rich variety of dynamics, from equilibrium classical markets to markets exhibitingvery complex out-of-equilibrium features such as coexistence of various attractors andcycles.

The present work is mainly interested in estimating the parameters of this dynamical modelfrom discretely observed stock prices and in filtering the unobserved mispricing process.After having reformulated the diffusion equations in the state-space models framework,Sequential Monte Carlo (SMC) methods, and in particular Particle Filter, are introducedin chapter 3. They are indeed powerful methods to estimate such complex systems withintractable distributions.

The main idea behind them is to estimate the filtering distributions with a large number

1

2 Introduction

of samples, or particles, that evolve according to the dynamic equations of the system.Then, when a new observation occurs, particles that fit well to the data are selected andreproduced, while badly fitting ones are eliminated. In other words, the distribution isestimated sequentially, by going from a prior to an updated posterior at each time step.

A basic version of the Particle Filter is first presented before more advanced methods areintroduced. In particular, a fully adapted Particle Filter for the present model is derived.Overall, the filtering of the unobserved mispricing dynamic proves to be handled well bythe Particle Filter algorithms.

The problem of parameter estimation, on the other hand, appears to be much more difficult.Different approaches are attempted and developed in chapter 4. Maximum likelihood, atleast in its basic implementation, is not feasible in the present case, but can still give someindications of the properties of the model. An MCMC approach was intended but showedup to be ineffective. At the end, extending Particle Filter by including the parameters inthe state of the system proves to be the best solution.

In order to test the methods, various simulations are performed and summarized in chapter5. We explore the properties of the estimators in six different market types, intended torepresent a fair variety of situations.

While the algorithms show good performances in filtering the mispricing process from theobserved stock prices, they have more trouble with parameter estimation. It appears thatfinding a precise estimate of the parameters is out of reach. It stays nevertheless possible toachieve a qualitative estimation of the market type to which a given stock belongs. Whileit can look a bit disappointing, it is still a very interesting result, which can provide insightinto the dynamic of stock prices.

In any case, the goal of this model is not to explain everything about the dynamic of stockprices, but rather to shed light on the specific aspect of mispricing and in particular inidentifying situations of positive or negative conventions, of bullish or bearish markets, aswell as transitions between different regimes.

At the end of this work (chapter 6), the Dow Jones Industrial Average (DJIA) is analysedwith respect to the present model. The estimation of the parameters leads to a modelthat provides some interesting insight about the underlying dynamic. In particular, theestimated market is somewhere between equilibrium and a situation with coexisting positiveand negative conventions. The filtering of the mispricing, on the other hand, allows tohighlight some important financial periods and events of the last 40 years and gives rise toa better understanding of the price dynamic by distinguishing between the fundamentalvalue and the mispricing process.

Chapter 2

Dynamical model of stock prices

In this chapter, the dynamical model of the mispricing of an asset developed in Yukalov et al.(2009) is presented. As it is common in finance, the dynamic is described first in continuoustime (section 2.1). Nevertheless, for the problem of filtering and parameter estimation weare concerned with, it is more convenient to give its state-space representation (section2.2). The chapter ends with some comments on the time scale of the phenomenon and anexample of numerical simulation.

2.1 Nonlinear dynamical model in continuous time

In Yukalov et al. (2009), the stock price is modelled as a fundamental price plus a mispricingcomponent, whose dynamic reflects some qualitative characteristics of a market, such asthe speculative or mean-reverting behaviours of agents, or the level of regulation anduncertainty. The richness of its dynamic enables to describe in a same framework marketsat equilibrium and out-of-equilibrium. The motivations for such a model are given in thissection, together with its mathematical description and interpretation.

2.1.1 Equilibrium market?

The variations of prices on the stock market result from the interactions of thousands ofagents, from professional hedge-funds to individual traders. Prices reflect the consensus atsome time of the value of a stock. It does not mean though, that they represent the truevalue of the stock, given by the expected future discounted cash flow and dividends, butrather the price that people are ready to pay for this stock.

The standard view is that the randomness observed in prices is due to external informationonly, to which all investors have access. The resulting market is said to be at equilibrium.The usual way to model stock prices in this context is as a Geometric Brownian Motion(GBM), described by continuous stochastic differential equation as initiated by Bachelier(1900) and used in the standard Black and Sholes model (Black and Scholes, 1973).

However, some evidences tend to show that real markets exhibit features that are far fromequilibrium theory, such as bubbles and crashes, asymmetries, fat-tail distributions, changesof regime, etc. (Sornette, 2004). These features may be due to the interactions betweenagents, causing herding effects, to their irrational behaviours or to different technical

3

4 Dynamical model of stock prices

trading strategies, such as those based on trend following. The model at hand does notfocus on the individual strategies of the agents, too complex to capture and to model, buton the resulting effect of many interacting agents at the macroscopic level of the marketprices.

2.1.2 Mispricing model

The present work is interested in a model for describing the dynamic of equilibrium andout-of-equilibrium markets in an unified framework developed in Yukalov et al. (2009). Itspoint of view is that there is a fundamental price, called pf , that reflects the true value ofa stock. but the actual price p as observed in the market is something different. The latteris rather the combination of the fundamental price and what we call a mispricing factor x.To be more precise:

log(p) = log(pf ) + x (2.1)

The mispricing is thus a systematic deviation from the fundamental price. In time ofbubbles, for example, the value of a stock is overestimated, resulting in a price much higherthan its fundamental value.

The mechanisms at the origin of the mispricing are difficult to unravel, because they involvethe interactions of many investors with heterogeneous strategies. The approach taken inYukalov et al. (2009) is to model the aggregate behaviour of all the agents and its effecton the global market. The mispricing is described with the following system of stochasticdifferential equations:

dx = ydt+ σxdWx (2.2)

dy = f(x, y)dt+ σydWy (2.3)

where Wx and Wy are two Wiener processes, possibly correlated. This description of thedynamic of the mispricing means that the changes are partly endogenous to the system,and partly due to external shocks. Otherwise stated, the rate of change of the mispricingis influenced by the mispricing itself and by its drift term plus some random fluctuations.This particular form is intended to capture the feedbacks mechanisms at work when manyactors interact and speculate on the prices.

By putting some structural constraints on the equations, specifically that the marketis symmetric and that the function f in 2.3 is additive, and by using the self-similarapproximation theory developed in Yukalov (1990), Yukalov et al. were able to find asolution to the system that does not drop higher-order terms. The resulting diffusionprocess is described as follows:

dx = ydt+ σxdWx (2.4)

dy = fNL(x, y)dt+ σydWy (2.5)

dq = dx+ dqf (2.6)

where q = log(p) and qf = log(pf ) and:

2.1 Nonlinear dynamical model in continuous time 5

fNL(x, y) = αx+ βy +Ax3 exp(−x2/µ2) +By3 exp(−y2/λ2) (2.7)

In the next paragraph, we will try to make a link between the parameters of this diffusionprocess and the economical mechanisms at hand during the formation of prices.

2.1.3 Interpretation of the parameters

It is possible to interpret the parameters of the function 2.7 in economical terms as follows:

• α embodies the individual behaviour of agents. If α < 0, it means that they havea mean-reversal strategy, while in time of bubbles or over-speculative markets, α islikely to be positive.

• β, if taken negative, can be seen as a factor for market friction, tending to reduce largechanges in the mispricing drift. Positive values of β lead to markets with explosiveprices, which can be only transient by definition. Because we do not consider timevarying parameters, we exclude the case of β positive in this study.

• A reinforces or reduces the effect of α. In a way, it represents the collective behaviourof the agents. The nonlinearity in x accounts for the fact that A has an influence onlyfor sufficiently large mispricing. On the other hand, the exponential term reduces itsinfluence to a limited range of x value.

• µ is important only when A > 0. It represent the uncertainty about the fundamentalprice: if µ is small, the uncertainty is small and the mispricing cannot be too big, whileif µ is big, the uncertainty is big and the price can go far away from its fundamentalvalue.

• B represent momentum strategies, based on the variation of the mispricing. WithB > 0 it means that people tend to reinforce a trend, while with B < 0 they playcontrarian strategies. The latter is almost never used by a majority of traders,therefore only positive values are analysed.

• λ control the changes of the mispricing drift. If it is big, mispricing can occureasily, while if it is small, mispricing are constrained to stay smaller. λ can thus beinterpreted as market regulation.

2.1.4 Market types

The interplay of these different parameters creates a large variety of market types, reflectingindividual and collective behaviours and at the same time properties of the market such asregulation and access to information. A thorough analysis of these market types in thedeterministic case has been done in Yukalov et al. (2009).

In some configurations, the mispricing tend to zero, representing a classical market atequilibrium. In others, different attractors coexist, with positive and negative conventions.Some limiting cycles can also appear around attractors for some values of the parameters.In total, there are 21 types of markets, whose description can be found in the reference


paper. It is particularly interesting to notice that a unique dynamic description can accountfor so many different behaviours.

When adding stochasticity to the dynamic, the overall structure of the market types persist,but the basins of attractions become more fuzzy and jumps can happen. This means forexample that the mispricing can go from a positive convention to an equilibrium or negativeone. This feature is particularly interesting when one thinks about bubbles and crashes. Itcould account for quick changes in prices, which are observed in financial time series. Themain focus of the present work will be to identify to which market type an observed stockbelongs.

2.1.5 Fundamental price

There is an open question about the way to model the underlying fundamental price,because the described model does not depend on it. Here we decided to adopt the standardview and model it as a GBM, with fixed drift µf and volatility σf :

dpf = µfpfdt+ σfpfdWf (2.8)

where Wf is a Wiener process independent of Wx and Wy.

Taking the transformation qf = log(pf ) and using Ito’s lemma to get the dynamic of qfand plugging it in 2.10 together with 2.4, one obtains:

dqf =

(µf −

σ2f

2

)dt+ σfdWf (2.9)

dq =

(y + µf −

σ2f

2

)dt+ σfdWf + σxdWx (2.10)

This allows to express the dynamic of the price p as a mix between the mispricing dynamicand the GBM dynamic of the fundamental price. We could have chosen a more complexmodel, with a stochastic volatility component such as GARCH or any other alternative.Nevertheless, because of its simplicity, a GBM allows to focus more on the mispricingdynamic, where another model could have added more confusion. It assumes somehow thatall the interesting part of the price dynamic comes from the mispricing, which is certainlyreductive but very useful for analysing the power of the model.

2.2 State-space representation

In this section we introduce state-space models before giving a description of the continuoustime diffusion process in this discrete time framework.

2.2.1 General state-space model

A state-space model is a compact way to describe the evolution of a dynamical system(Kunsch, 2001). They are widely used in engineering (e.g. for tracking or control applica-

2.2 State-space representation 7

. . .

. . . Yt−1 Yt Yt+1

Xt−1 Xt Xt+1 . . .

. . .

Figure 2.1: Graphical representation of a general state-space model

tions), in speech modelling, in biology (e.g. for modelling DNA sequences) and in finance(e.g. stochastic volatility model).

A state-space model is composed of a vector of hidden variables Xt, representing the stateof the system, and an observed variable Yt (that could as well be multivariate). The hiddenvariables evolve according to some dynamic, while the observations are a probabilisticfunction of the Xt. As an example, in a tracking application, the state Xt is the positionof an object together with its orientation, speed and acceleration, while the observationsYt are the noisy information gathered by some sensors. The dynamic of the state variablesis then given by the control on the movement of the object, that probably varies in time.

If the model is first order markovian, then the following properties hold:

Xt ⊥⊥X1:t−2, Y1:t−1 |Xt−1 (2.11)

Yt ⊥⊥X1:t−1, Y1:t−1 |Xt (2.12)

State-space models can also be called Hidden Markov Model (HMM) in the literature, butin the present work we will not make use of this denomination, letting it to describe modelswith a discrete state space.

State-space models are very general and can be used to describe all kinds of dynamics,deterministic and stochastic, linear and nonlinear, Gaussian and non-Gaussian, etc. Theyallow as well to include easily time-evolving parameters as part of the hidden states, as willbe done in section 4.3 on parameter estimation. A graphical representation of a markovianstate-space model is given in the figure 2.1.

2.2.2 Discretization of the state variables

The model for stock prices described in the previous section 2.1 is defined by a diffusionprocess. Its dynamic being continuous in time, it cannot be directly described within thestate-space framework, which is concerned with discrete dynamics. One needs then todiscretize the diffusion process to translate it into a state-space model. Using the Eulerapproximation scheme, the equations 2.4 and 2.5 become as follows:

Yt+∆t = Yt + fNL(Xt, Yt)∆t+ σy√

∆t εy,t+∆t (2.13)

Xt+∆t = Xt + Yt∆t+ σx√

∆t εx,t+∆t (2.14)

where εy, εx ∼ N (0, 1) are the increments of the Wiener processes. Henceforth we assumethat they are uncorrelated, even though this assumption could be relaxed later.


Remark on notation The letters used here should not be confused with the one in theprevious section. In the present case they represent the variables defined in the section2.1, i.e. the mispricing x and its drift term y. We reserve the use of capital letters to thediscretized version of the model to make the distinction clear. Furthermore, for ease ofnotation, we will write Xt+1 instead of Xt+∆t, assuming that the time unit is ∆t.

In terms of state-space models defined above, Yt and Xt are the hidden variables, the stateof the system. The naive implementation in terms of state-space model would have beento include the fundamental and observed prices into the state-space as hidden variables.Furthermore, by discretizing equation 2.10 in the same way we did for x and y, we couldhave skipped the fundamental price. The state variable would then have been Yt, Xt

and Qt. Nevertheless, an important issue would have arisen with such a naive definition.Indeed, what would have been the observations in such a case? If one says that it is theprice, then it is observed perfectly; and more importantly it would not fulfil property 2.12,the independence of the observations given the state. All filtering techniques consideredhere require the observation process to be noisy, if one wants to avoid degeneracy problems.Therefore, it would have been necessary to define an arbitrary level of noise, makingthe model more dependent on tuning parameters. We decided to take another approachdescribed in the next paragraph.

2.2.3 Observation process

In our state-space representation, we define the observations as the log-returns of the price:Rt = Qt −Qt−1. By recalling the definition of the mispricing given in 2.1 and using theproperties of a GBM, one can write the observation process as follows:

Rt = Xt −Xt−1 + µf∆t+ σf√

∆t εf (2.15)

where εf ∼ N (0, 1) is independent of εx and εy. Indeed, the fundamental price beinga GBM, the log of the fundamental price Qf follows a random walk and therefore itsincrements are independent and normally distributed.

In this way, the log-returns are defined as a function of the state variables plus some noise.The interesting point of this representation is that the observation noise is not arbitrary,but can be justified with well founded arguments on the fundamental price dynamic.

In terms of economical interpretation, however, what can look as a convenient mathematicalreformulation is in fact a radical change of paradigm. Indeed, the fundamental price isusually considered as the primary quantity of interest. The observed price is then seen asa distortion of the true price, due to various phenomena related to the imperfections of themarket. In this approach, one should have chosen the fundamental price as the main statevariable, whereas it is absent of our model.

The point is that the model at hand focus on the mispricing process. The kind of questionswe are interested in is: in what kind of regime are we? is it a period of positive or negativeconvention? is there a risk of bubble? etc. A justification for considering the fundamentalprice as simple noise can be found in Samuelson (1965) where properly anticipated pricesare shown to fluctuate randomly. This suggests that the fundamental price is really purelyrandom and can be seen as noise. The mispricing comes from a misinterpretation of thefundamental price, or from a speculative strategy, and it is likely to follow some patterns,

2.2 State-space representation 9

. . .

. . .

. . .

Yt−1 Yt Yt+1

Xt−1 Xt Xt+1

Rt−1 Rt Rt+1

. . .

. . .

. . .

Figure 2.2: Graphical representation of the price dynamic

which the model at hand tries to describe. In the scope of the present work, we arenot interested in modelling perfectly the fundamental price, but rather in identifying theamplitude and the sign of the mispricing, or at least the market type to which a givenstock belongs.

2.2.4 Formulation of the model

In order to reformulate the continuous model for the dynamic of stock prices in terms ofstate-space models, it has been necessary to discretize the diffusion equations and redefinethe observation process. Let’s call the state of the system Zt = (Yt, Xt) and the observationsRt. It is often useful to write the equations of the system in terms of probability densityfunctions. The transition density of the state is : p(Zt+1 | Zt). By looking at equations2.13 and 2.14 and remembering that εy, εx are iid N (0, 1), it can be written as the followingtwo marginal normal densities:

p(Yt+1 | Yt, Xt) =1

σy√

2π∆te− 1

2

(Yt+1−(Yt+f

NL(Xt,Yt)∆t)

σy√

∆t

)2(2.16)

p(Xt+1 | Yt, Xt) =1

σx√

2π∆te− 1

2

(Xt+1−(Xt+Yt∆t)

σx√

∆t

)2(2.17)

The observation process is written as p(Rt | Zt, Zt−1). Taking equation 2.15 and withεf ∼ N (0, 1) independent of εy and εx, it has the following normal density:

p(Rt | Zt, Zt−1) =1

σf√

2π∆te− 1

2

(Rt−(Xt−Xt−1+µf∆t)

σf√

∆t

)2(2.18)

For visualization purpose, the graphical representation of the model given in figure 2.2.It appears to look much more complicated than for a simple general state-space model.One has to notice first that the hidden variables have not been written as a vector butseparately, what adds a layer to the graph. More importantly, the model is clearly nonmarkovian: it does not fulfil property 2.12. One could however build a new process bygrouping two time steps together to recover the independence property. Another approachto tackle this issue has been followed, which is fully described in section 3.1 on filtering.


2.3 Numerical simulation

The Euler approximation scheme described in the previous section has been used to simulatethe dynamical model at hand. Some qualitative analysis tend to show that the dynamicoccurs at a time scale of about one year or approximately 250 trading days. This is basedon looking at the typical time a cycle or change of attractors take place. When analysingreal data, however, some adjustments will still be probably needed. If one time unit is oneyear and we study daily data, it means that ∆t = 1/250, which is small enough to ensurethe Euler scheme to be a good approximation.

Once this is fixed, other parameters can be adapted. The parameter µf can be set as theaverage yearly return, which can be estimated independently from data. For the simulation,the value 12.5% as been chosen, that seems to be empirically valid. The volatility of stockprices, or equivalently the standard deviation of the log-returns, is a combination of thestandard deviation of the underlying fundamental price, modelled as a GBM, and of themispricing process. Using equation 2.15, the daily volatility can be decomposed as follows:

√Var(Rt) =

√σ2f∆t+ Var(Xt −Xt−1) (2.19)

≈√

∆t(σ2f + σ2

x) (2.20)

The daily volatility of stock prices is empirically observed to be about 1-2%, hence thevalue of 1.5% has been set for the simulation. The time scale of the process being one year,or approximately 250 days, the volatility has to be scaled appropriately by multiplyingit by

√250. The important question is to know how much of the volatility comes from

the fundamental price and how much comes from the mispricing dynamic. For simulationpurpose, both are set to contribute for half of the volatility. The case of σy is more difficultto decide, because it has no direct influence on the price volatility but on the generaldynamic of the system, like the probability to change attractors, etc.

One simulated trajectory over a period of 40 years is displayed on figure 2.3. The parameterscorrespond to a market with three attractors. It is interesting to notice how the price is acombination of the fundamental value and the mispricing. In the 1980s the mispricing Xjumps from the zero attractor to a negative convention, producing a steep drop of the price.Some other fluctuations are observed, due to the existence of cycles around the attractors.

2.3 Numerical simulation 11

−1.

00.

01.

0

y

−3

−2

−1

0

x

68

1012

qf

1960 1970 1980 1990

Time

67

89

10

q

Figure 2.3: One example of simulated stock prices, with parameters α = −1,A = 1, β = −1,

B = 1, µ = 2, λ = 1, σf = σx = 0.015√

250√2

, σy = 3σx

Chapter 3

Particle Filter

The general problem of filtering for state-space models is introduced in the first section ofthis chapter. Then Sequential Monte Carlo methods and in particular Particle Filter areexposed (section 3.2), following closely the recent tutorial Doucet and Johansen (2009) andthe book Doucet et al. (2001). A naive implementation is first given, before exposing amore general version of the Particle Filter. More advanced techniques, such as the AuxiliaryParticle Filter, are presented in section 3.3 and a fully adapted algorithm is derived for themispricing model at hand. An illustration of the Particle Filter with an animated figureconcludes the chapter (section 3.4).

3.1 Bayesian filtering

In filtering, one is interested in estimating the distribution of the state of the system attime t given all the history of observations up to this time. For an usual markovian statespace model with state Z and observation R, applying Bayes’ theorem, it is in principlepossible to write:

p(Zt+1 | R1:t+1) =p(Rt+1 | Zt+1, R1:t) p(Zt+1 | R1:t)

p(Rt+1 | R1:t)(3.1)

=p(Rt+1 | Zt+1, R1:t)

∫p(Zt+1 | Zt) p(Zt | R1:t) dZt

p(Rt+1 | R1:t)(3.2)

where by using the law of total probabilities and markov properties on the second term inthe nominator of equation 3.1:

p(Zt+1 | R1:t) =

∫p(Zt+1, Zt | R1:t) dZt (3.3)

=

∫p(Zt+1 | Zt, R1:t) p(Zt | R1:t) dZt (3.4)

=

∫p(Zt+1 | Zt) p(Zt | R1:t) dZt (3.5)

13

14 Particle Filter

This decomposition of the filtering density can be seen as a two-stage process, such as usedin the Kalman Filter. The first part p(Zt+1 | R1:t) is the prediction and the second partp(Rt+1 | Zt+1, R1:t) is used in the update step, to correct the prediction after having seenthe new observation Rt+1. The Particle Filter presented in the next section makes use ofthis decomposition, at least in its first exposition.

Furthermore, it is important to notice that the filtering procedure is defined recursively,because the prediction density can be itself written as a function of the previous filteringdensity as shown in equations 3.3-3.5.

The expression 3.2 would be directly useful if p(Rt+1 | Zt+1, R1:t) was known. With an usualmarkovian state-space model, this is actually the case, because given the state at time t,an observation is independent of the past history. Then it holds that p(Rt+1 | Zt+1, R1:t) =p(Rt+1 | Zt+1), which is the observation density, in principle known by definition of themodel.

In our case though, Rt+1 needs to be conditioned on Zt+1 and Zt to be independent ofR1:t, because the model is only second-order markovian. One could have solve this problemby defining a new state grouping two time steps and correcting the transition densitiesaccordingly. However we take here a different approach, in which the main idea is tocondition on Zt and integrate it out as follows:

p(Rt+1 | Zt+1, R1:t) =

∫p(Rt+1, Zt | Zt+1, R1:t) dZt

=

∫p(Rt+1 | Zt+1, Zt, R1:t) p(Zt | Zt+1, R1:t) dZt

=

∫p(Rt+1 | Zt+1, Zt, R1:t)

p(Zt+1 | Zt, R1:t) p(Zt | R1:t)

p(Zt+1 | R1:t)dZt

=

∫p(Rt+1 | Zt+1, Zt)

p(Zt+1 | Zt) p(Zt | R1:t)

p(Zt+1 | R1:t)dZt (3.6)

where we used Bayes’ theorem to rewrite p(Zt+1 | Zt, R1:t) and the markov properties ofthe model to drop the dependencies on R1:t.

By taking the normalizing factor out of the integral and plugging equation 3.6 in equation3.2, the p(Zt+1 | R1:t) cancel out and the resulting filtering density becomes as follows:

p(Zt+1 | R1:t+1) =

∫p(Rt+1 | Zt+1, Zt) p(Zt+1 | Zt)p(Zt | R1:t) dZt

p(Rt+1 | R1:t)(3.7)

where we can drop the denominator because it is just a normalizing constant. Thisexpression is the same as 3.2 except that the likelihood term is put in the integral. Thisrecursive definition of the filtering density is, however, not directly tractable, because itinvolves a series of integrals whose number grows with t, except in some special cases likediscrete state space or linear Gaussian models. Our model, however, is not discrete andhighly nonlinear and precludes thus the use of any exact analytical methods. This is thereason why we introduce in the next section Sequential Monte Carlo methods, which arewell suited to evaluate high-dimensional integrals.

3.2 Particle Filter algorithms 15

3.2 Particle Filter algorithms

The idea of Sequential Monte Carlo (SMC) methods is to use a Monte Carlo (MC)approximation of the filtering density p(Zt | R1:t) in 3.7. Indeed, if one is able to drawenough samples from this density, all its expectations can be computed with arbitrarysmall variance, independently of the number of dimensions. The recursive nature of thefiltering causes, however, this task to be impossible. Sequential methods allow to bypassthis problem by drawing N samples, or particles, from the initial distribution and propagatethem through the diffusion process. Each particle k at time t is associated with a weight

w(k)t . An MC approximation of the integral in equation 3.7 is then given by:

p(Zt+1 | R1:t+1) ∝N∑k=1

p(Rt+1 | Zt+1, Z(k)t ) p(Zt+1 | Z(k)

t )w(k)t (3.8)

The weights w(k)t have to be proportional to the likelihood of the observations, sum to

one and be updated sequentially. Furthermore, while in a first naive implementation theparticles are evolved using the transition density of the diffusion process, in a second step,better proposal distributions can be designed, that take into account the observation atthe next time step. Such improvements allow to reduce the variance of the estimator andalleviate the problem of sample depletion; they will be described later in 3.3.2.

We call SMC methods applied to filtering Particle Filter. They have been introduced firstin Gordon et al. (1993) and since then have become the standard tool for this kind ofproblem.

Particle Filter are clearly of a sequential nature. Indeed, a sample from p(Zt | R1:t) isrequired to sample from the next filtering density p(Zt+1 | R1:t+1). It is possible to seeeach iteration as a two stages process, as explained earlier. Indeed, if one has a particle

Z(k)t , with associated weight w

(k)t = 1/N , from the previous filtering density, to obtain a

sample at the next iteration we can process as follows:

• predict Z(k)t+1 ∼ p(Zt+1 | Z(k)

t )

• update w(k)t+1 ∼ p(Rt+1 | Z(k)

t+1, Z(k)t )

This is the basic idea of the Particle Filter, which will be developed further in the comingsections.

3.2.1 Sequential Importance Sampling

A basic Sequential Importance Sampling (SIS) Particle Filter algorithm making use ofthe MC approximation in 3.8 would be to generate N particles at time 0 with some priordistribution, let them evolve according to the diffusion process and update the weight of

each sample as w(k)t+1 ∝ w

(k)t p(Rt+1 | Z(k)

t+1, Z(k)t ). Nevertheless such an algorithm would

suffer an important drawback: the distribution of the weights will become very skewedafter a few time steps, leading quickly to a poor approximation of the filtering distribution,eventually with only one particle having non-zero weight. Actually the variance of theresulting estimate increases exponentially with time. A first fix to this problem is throughresampling, presented in the next paragraph.

16 Particle Filter

3.2.2 Sequential Importance Resampling

The idea of Sequential Importance Resampling (SIR) Particle Filter is to sample at eachstep N particles from the approximated distribution. Instead of keeping the same particlesand updating sequentially their weights as in SIS, one samples N new particles from the

multinomial distribution{Z

(k)t , w

(k)t

}Nk=1

at each step and gives each of them equal weights.Then at the next step, each particle get a weight proportional to the likelihood of the data:

w(k)t+1 ∝ p(Rt+1 | Z(k)

t+1, Z(k)t ). The particles are evolved using the transition distributions

of the diffusion. More precisely, the distribution at time t+1 of a particle k is given by

p(Zt+1 | Z(k)t ). The procedure is described in more details in algorithm 1.

There are two ways to represent a probability distribution with a finite sample: witha set of weighted samples or by drawing samples with replacement with probabilitiesproportional to the weights. SIS uses the first approximation while SIR uses the latter.The solution might look unreasonable at first sight, because it seems to add variance tothe estimate. Nevertheless, this is the price to pay to avoid the number of samples toshrink very quickly. In fact, when it comes to estimate moments of the filtering density,the weighted approximation obtained before the resampling step is to be preferred.

SIR can be compared intuitively to biological evolution. If one sees the set of particles asa population with different genes, the resampling step is in fact a selection of the fittest.The particles that fit the most to the environment, or to the data, survive and have manyoffspring, while the ones that do not fit are very likely to die. This process allows tohave more particles in regions of the space where the distribution has more mass. It isindeed impossible to know in advance where these regions are, thus a sequential adaptiveprocedure like this is particularly suited for this task.

Because particles are resampled regularly, they are not independent from each other anymore. We say that the particles interact. This causes the properties of the estimator to bemuch more difficult to derive analytically.

3.2.3 General Particle Filter

If we have a sample Z(k)t from p(Zt | R1:t), then it is easy to obtain a sample from the

prediction density p(Zt+1 | R1:t, Z(k)t ), by sampling from the transition density p(Zt+1 |

Z(k)t ). This can be viewed as the prior distribution of the particle k at time t+1. This is

the separation into predict and update step that we have seen earlier. But indeed, nothingforces us to decompose it like that. To obtain a sample from the posterior of interest,taking into account the observation at time t+1, one can use Bayes’ theorem and sampleZt+1 from:

p(Zt+1 | Rt+1, Z(k)t ) ∝ p(Rt+1 | Zt+1, Z

(k)t ) p(Zt+1 | Z(k)

t ) (3.9)

The question is how to sample from this posterior distribution. With SIS, one uses weightedsamples, while with the SIR algorithm, particles are resampled with probability proportionalto the likelihood of the observation at time t. If the unconditional transition density usedas prior is not close to the posterior, or otherwise stated if the new observation Rt+1 isvery informative, it causes the distribution of weights to be very skewed and leads thus


Algorithm 1: SIR Particle Filter

Input: R1:T

// Initialization:

sample Z(k)0 ∼ p0(Z), k = 1, . . . , N

set w(k)0 ← 1/N, k = 1, . . . , N

// Time evolution:for t← 1 to T do

for k ← 1 to N do// Propagate particle:

Z(k)t ← diffusionProcess(Z

(k)t−1)

// Compute weight:

w(k)t ∝ p(Rt | Z(k)

t , Z(k)t−1)

end

// Resampling:for j ← 1 to N do

sample Z(j)t ∼ Multinomial

({Z

(k)t , w

(k)t

}Nk=1

)end

end

return

{{Z

(k)t

}Nk=1

}Tt=1

18 Particle Filter

to a poor approximation of the posterior distribution. If one thinks of an accept/rejectsampler, it causes the acceptation rate to be very low.

If the goal is to sample from the posterior distribution, it becomes obvious that it is notcompulsory to use the transition density to make evolve the particles. It is in principlepossible to use other proposal distributions that are closer to the posterior of interest.

Because we want the proposal to be as close as possible to p(Zt+1 | Rt+1, Z(k)t ), we see

that it makes sense to allow it to depend on Rt+1 as well. We thus express the proposal

as q(Zt+1 | Rt+1, Z(k)t ). The weights of the particles have now to take into account the

proposal density as well:

w(k)t+1 ∝

p(Rt+1 | Z(k)t+1, Z

(k)t ) p(Z

(k)t+1 | Z

(k)t )

q(Z(k)t+1 | Rt+1, Z

(k)t )

(3.10)

The general Particle Filter is described in algorithm 2. One can easily recover the weights inthe basic SIR Particle Filter by plugging in the transition density as proposal in expression3.10.

Algorithm 2: General Particle Filter

Input: R1:T

// Initialization:

sample Z(k)0 ∼ p0(Z), k = 1, . . . , N

set w(k)0 ← 1/N, k = 1, . . . , N


for k ← 1 to N do// Propose new state:

Z(k)t ← q(Zt | Rt, Z(k)

t−1)// Compute weight:

w(k)t ∝ p(Rt|Zt,Z(k)

t−1)p(Zt|Z(k)t−1)

q(Zt|Rt,Z(k)t−1)

end

// Resampling:for j ← 1 to N do

sample Z(j)t ∼ Multinomial

({Z

(k)t , w

(k)t

}Nk=1

)end

end

return

{{Z

(k)t

}Nk=1

}Tt=1


3.2.4 Sample depletion and resampling techniques

The resampling move introduced in the SIR algorithm was intended to reduce the problemof sample depletion, characteristic of the SIS Particle Filter. In the latter, indeed, thedistribution of the weights becomes very skewed as time goes on, leading quickly to a poorapproximation of the target density.

The intuitive idea of resampling is to approximate a distribution with a set of particlesequally weighted, by sampling them with probabilities proportional to their weights in theIS approximation. This step avoids the problem of sample degeneracy, while at the sametime it increases the variance of the estimator.

So far we performed this resampling step by sampling from a multinomial distribution. Toalleviate the problem of increasing variance added to resampling, it is possible to use othermethods that produce still unbiased samples but without adding too much randomness. If

N(k)t denotes the number of times the particle k is sampled at time t, the MC approximation

in equation 3.8 can be rewritten as follows:

p(Zt+1 | R1:t+1) ∝N∑k=1

N(k)t

Np(Rt+1 | Zt+1, Z

(k)t ) p(Zt+1 | Z(k)

t ) (3.11)

When designing new sampling methods, the property that should be conserved for unbi-asedness is:

E[N

(k)t | {w(k)

t }Nk=1

]= N w

(k)t (3.12)

While this is achieved by drawing from a multinomial distribution as described so far,other sampling schemes are possible, such as tree based resampling or stratified resampling(Carpenter et al., 1999) and circular or systematic resampling (Whitley, 1994; Carpenteret al., 1999). In the present work we used the systematic resampling technique, which iswidely spread, and is described in algorithm 3.

Algorithm 3: Systematic resampling

// Sample U1:U1 ∼ Unif(0, 1/N)for k ← 1 to N do

Uk = U1 + k−1N

end// Transform by F−1

n

for k ← 1 to N do

N(k)t = #{Uj :

∑k−1i=1 w

(i)t ≤ Uj ≤

∑ki=1w

(i)t }

end

return{N

(k)t

}Nk=1

The basic idea behind this algorithm is that the sequence of Uk is systematically spreadbetween 0 and 1. Then we use the inverse empirical distribution function Fn to find the

N(k)t . This uses the well known fact that if X ∼ F then F (X) ∼ Unif(0, 1). So if one has a

uniform U , it is possible to obtain a sample from F as: X ∼ F−1(U). In the present case we

20 Particle Filter

do not use technically a uniform sample, but instead a sequence of number systematicallyspread between 0 and 1, which remove some randomness in the process while preservingthe important properties of the distribution. Furthermore, Fn is a stepwise function, where

the steps occurs at values k and are of size w(k)t , therefore the inverse operation can be

easily achieved as described in the algorithm.

Resampling is necessary to avoid sample depletion on the long run, but on the other hand

it adds some variance by randomly choosing the N(k)t . Therefore, when one is interested in

the filtering density, we should always preferred the weighted approximation 3.8, whereasresampling is used to continue the propagation of the particles.

Even with resampling, the problem of sample degeneracy is not completely solved. Indeed,as particles evolve, a lot of them are suppressed and as time goes on finally only a few willremain from the original ones. By resampling, the problem is a bit alleviated, because thepool of particles is somehow regenerated at each step. Nevertheless, if one is interestedin the probability of a full path, the number of samples shrinks very fast. It still stayspossible to estimate accurately marginal distribution at a given time and related interestingmoments as in filtering, but it forbids to use methods that require accurate approximationof the probability of a path like maximum likelihood. Actually the issue does not reallycome from the Particle Filter method itself, but rather from the fact that the distributionof interest is of arbitrary high dimension, growing with time, and it becomes quicklyimpossible to estimate it accurately with a finite number of samples.

3.3 Advanced Particle Filters

The basic Particle Filter algorithms have been presented in the previous section. It is stillpossible to refine them to obtain better performances. This section exposes the AuxiliaryParticle Filter and in particular derives an efficient and fully adapted Particle Filter forthe specific model that we are interested in.

3.3.1 Auxiliary Particle Filter

The idea of the Auxiliary Particle Filter developed by Pitt and Shephard (1999) is toperform a pre-selection of the particles, before letting them evolve, to put more attention onthe particles likely to fit well at the next step. This should in principle produce more evenweights and thus a better approximation of the filtering density. Indeed if the posteriorp(Zt+1 | Rt+1, Zt) is very different from the prior p(Zt+1 | Zt), because the next data isvery informative, then a lot of particles might die, or equivalently get small weights. Bycomparison, if one could look one step ahead which particles are more likely to fit well tothe data, it would be possible to let them have more offspring. In the first exposition ofthe Auxiliary Particle Filter, this is achieved by sampling jointly the state and an index k.Then the indices, that play the role of auxiliary variables, are discarded. More precisely,one samples from:

p(Zt+1, k | R1:t+1) ∝ p(Rt+1 | Zt+1, Z(k)t ) p(Zt+1 | Z(k)

t ) w(k)t ) (3.13)

To sample from this distribution using SIR, we need a proposal for the states and the

index, such that Z(j)t+1, j ∝ q(Zt+1, k | Rt+1, Z

(k)t ). Then we keep only the states and we

3.3 Advanced Particle Filters 21

give them new weights defined as:

w(j)t+1 ∝

p(Rt+1 | Z(j)t+1, Z

(kj)t ) p(Zt+1 | Z(kj)

t )

q(Zjt+1, kj | R1:t+1)

(3.14)

At first look, some confusion seems to be added by the new indices. Things might becomeclearer after seeing a basic example of proposal distribution, which is actually not veryadvisable:

q(Zt+1, k | R1:t+1) ∝ p(Rt+1 | µ(k)t+1, Z

(k)t ) p(Zt+1 | Z(k)

t ) (3.15)

In this case, Z(k)t+1 is anticipated by looking at the mean or mode or whatever point

estimate µ(k)t+1 from the transition density p(Zt+1 | Z(k)

t ). It can be seen that particles

whose µ(k)t+1 produces a high likelihood are more likely to be selected. To implement this

idea, it is necessary to sample first the new indices j from the marginal of equation 3.15,which is a multinomial distribution on k = 1 . . . N , with probabilities proportional to

λk = p(Rt+1 | µ(k)t+1, Z

(k)t ). These probabilities can be seen as the equivalent of intermediary

weights of the particles. By choosing this simple proposal, the final weights take the form:

w(j)t+1 ∝

p(Rt+1 | Z(j)t+1, Z

(kj)t )

λ(kj)t+1

(3.16)

because the second part cancel out. It is easy to see how to adapt it to the general casewith an arbitrary proposal distribution.

The current use of Auxiliary Particle Filter does not actually sample twice the data. Indeed,it can be shown that the algorithm is equivalent to a standard Particle Filter but intended toestimate a slightly modified distribution. This latter formulation is described in algorithm4 and requires only one sampling step, but with modified weights.

3.3.2 Fully Adapted Particle Filter

We have seen in section 3.2.3 that in the general Particle Filter, a proposal is usedto approximate the posterior distribution. In the basic SIR algorithm, the proposal issimply the transition density and the data are weighted according to the likelihood of theobservations. If it was possible to actually sample directly from this target distribution,the particles would be more evenly weighted and the resulting approximation would bebetter. If the proposal is actually the true posterior, then the weights are given by

w(k)t+1 ∝ p(Rt+1 | Z(k)

t ). This can be shown by rewriting the proposal as follows and plugit in the definition of weights given in equation 3.10, noticing that all terms but this onecancel out.

q = p(Zt+1 | Rt+1, Z(k)t ) =

p(Zt+1, Rt+1 | Z(k)t )

p(Rt+1 | Z(k)t )

(3.17)

=p(Rt+1 | Zt+1, Z

(k)t ) p(Zt+1 | Z(k)

t )

p(Rt+1 | Z(k)t )

(3.18)

22 Particle Filter

Algorithm 4: Auxiliary Particle Filter

Input: R1:T

// Initialization:

sample Z(k)0 ∼ p0(Z), k = 1, . . . , N

set w(k)0 ← 1/N, k = 1, . . . , N


for k ← 1 to N do

// Sample an estimate of Z(k)t , e.g:

µ(k)t = E[Zt | Z(k)

t−1]// Upated weights:

λ(k)t ∝ p(Rt | µ

(k)t , Z

(k)t−1)w

(k)t

end// Resampling:

sample N indices Ij with P (Ij = k) = λ(k)t

for j ← 1 to N do// Propagate particles:

Z(j)t ∼ p(Zt | Z

(Ij)t−1 )

// New weights:

w(j)t ∝

p(Rt|Z(j)t ,Z

(Ij)

t )

p(Rt|µ(Ij)

t ,Z(Ij)

t−1 )

end

end

return

{{Z

(k)t

}Nk=1

}Tt=1

3.3 Advanced Particle Filters 23

The gain is more evident by looking at the Auxiliary Particle Filter. Indeed let’s choose asproposal:

q = p(Zt+1, k | R1:t+1) ∝ p(Zt+1 | Rt+1, Z(k)t ) p(Rt+1 | Z(k)

t ) (3.19)

Then the algorithm consists in sampling the particles with intermediary weights given by

λ(k)t+1 ∝ p(Rt+1 | Z(k)

t ), making them evolve according to p(Zt+1 | Rt+1, Z(k)t ) and reweight

them. But by decomposing the transition density as in equation 3.18, one can rewrite theproposal density; then the expression 3.14 for the final weights of the particles cancel out.It means basically that every particles get the same weight and that no second resamplingstep is needed. In this case the Particle Filter is said to be fully adapted.

To implement it, one can use the normal SIR Particle Filter algorithm, by substituting thetransition density by the given proposal and the measurement density by the prediction

density p(Rt+1 | Z(k)t ). There is no need to do two resampling steps as in the original

Auxiliary Particle Filter, because the final weights are all even.

It happens that it is fortunately possible to derive the required densities with our model.Indeed, the transition density (equations 2.16 and 2.17) and the observation process(equation 2.18) can be written more compactly as follows:

Zt+1 | Zt ∼ N (µ(Zt), Σz) (3.20)

Rt+1 | Zt+1, Zt ∼ N (hT (Zt+1 − Zt) + µf∆t, σ2f∆t) (3.21)

where:

µ(Zt) =

(Yt + fNL(Xt, Yt) ∆t)

Xt + Yt∆t

)Σz =

(σ2y∆t 0

0 σ2x∆t

)h =

(0

1

)

From which it follows that the joint distribution of Zt+1 and Rt+1 given Zt is again normal:

(Zt+1, Rt+1) | Zt ∼ N

( µ(Zt)

hT (µ(Zt)− Zt) + µf∆t

),

Σz Σzh

hTΣz σ2f + hTΣh

(3.22)

It is now possible to factor this joint distribution in the two parts that we need forimplementing the fully adapted particle filter:

p(Zt+1, Rt+1 | Zt) = p(Rt+1 | Zt) p(Zt+1 | Rt+1, Zt) (3.23)

This factorization is not difficult to achieve using the properties of the multivariate normaldistribution. Indeed the first part is nothing but a marginal and the second a conditionaldistribution, for which standard formulae are known. Simplifying the results, we obtainthe following expressions:

Rt+1 | Zt ∼ N(Yt∆t+ µf∆t, (σ2

f + σ2x)∆t

)(3.24)

Zt+1 | Rt+1, Zt ∼ N (m, S) (3.25)

24 Particle Filter

where:

m =

(Yt + fNL(Xt, Yt)∆t)

Xt + Yt∆t

)+

(0

σ2x∆t

)Rt+1 − (Yt∆t+ µf∆t)

(σ2f + σ2

x)∆t

S =

σ2y 0

0 σ2x

σ2f

σ2f

+σ2x

∆t

A quick check shows that these expressions make sense. The expectation of the return Rt+1

given the previous state Zt is the conditional expectation of Xt+1 −Xt: E[Xt+1 −Xt | Zt],plus the trend of the fundamental price µf∆t, while its variance is the combination of thevariance of the mispricing X and of the fundamental price. This means that, basically, notknowing the value of Xt+1 increases the variance of the observation process by a factorproportional to the variance of X.

For the distribution of the state Zt+1 given the next observation Rt+1 and the previousstate Zt, we can notice that the expectation is not modified for Y , whereas the one forXt+1 is corrected by a factor proportional to the difference between the observation andits expected value and to the relative strength of σx and σf . It can be seen as a blindprediction followed by a correction after having seen the next observation.

3.4 Illustration of the Particle Filter

As already explained, the Particle Filter is a process of birth and death. Particles evolveindependently according to the dynamic equations of the system until there is a newobservation. Then the fitness of every particle is assessed by looking at the likelihood ofthe data given their trajectory. In SIR, a resampling step occurs, where particles with agood fitness are multiplied, whereas particles that fit poorly the data die. This produce atree structure, a little bit like a genealogical tree. An illustration of this process is given infigure 3.1. If you have access to the digital version of this document and Acrobat Reader,you can play the animation with the control buttons at the bottom of the figure.

In figure 3.1 a trajectory of the dynamical model (in red) has been simulated with parameters(α=-1, A=1, β =1, µ=2, λ=1) which corresponds to an individual mean-reverting, collectivespeculative market with an intermediate level of uncertainty and a strict regulation. Thesystem has three stable fixed points, one around zero, and one positive and one negativeconventions. The bottom panel shows the state Xt, the mispricing, while its derivative Ytis plotted in the upper one. At the beginning, a pool of 70 particles is generated randomlywith value uniformly distributed around the true state. Of course in a real application thetrue state value is unknown, and therefore particles are initialized with a broader scopeand around zero or whatever other prior information we can have about the initial state ofthe system. In practice, though, much more particles are used and this doesn’t present aproblem. In the present set-up, with only a few particles, it was nevertheless necessaryto concentrate them around the true value if one does not want to end up with only oneparticle after a few time steps.

By following the trajectory of particles, it is possible to see how the successful ones aremultiplied and how some of them disappear, because they follow an unlikely path. Onespecific particle and all its offspring is plotted in blue. It is very interesting to see how it

3.4 Illustration of the Particle Filter 25

Figure 3.1: An animated example of the Particle Filter

evolves with time. At some point it spreads well, before almost disappearing, while at theend it wins the game and remains the only survivor. This illustrates the problem of sampledegeneracy: as time goes on, it is very likely that only one or a few particles from theorigin survive. This effect prevents obviously from estimating the distribution of a wholepath of the system. Nevertheless, thanks to the resampling step, the marginal densities arestill well defined.

It is of particular interest to notice the shape of the filtering distribution: it is clearlybimodal at some places and exhibit a big variation with time. The bimodality comesfrom the fact that the system has two attractors, one around a positive and one around anegative convention. Because of the stochasticity of the process, there is a possibility tojump from one attractor to the other. It can happen that the state is close to the boundarycausing some particles to change attractor; this illustrates the high probability of such ajump at this moment. After seeing additional data, however, it becomes clear that oneof the hypothesis was wrong and only particles remaining in the true attractors survive.This example of bimodality is a big caveat to look only at a few moments of the filteringdensity, like the mean and the variance. For ease of representation, it is however what isdone most of the time.

26 Particle Filter

Chapter 4

Parameter estimation

In the previous section, the filtering of the state has been described. So far it was assumedthat the parameters of the diffusion equations were known. This assumption, however, isnever fulfilled in practice. It is therefore of major importance to find a way to estimatethem. Three main approaches are possible:

• Maximum Likelihood (ML)

• Monte Carlo Markov Chain (MCMC)

• State augmentation

Application of the three methods was attempted during this work, but only the last one ledto a feasible solution. The ML is first presented (section 4.1) and will be used sometimes todisplay cross-sections of the likelihood function, but for illustration purpose only due to itsheavy computational cost. A full implementation of a MCMC algorithm was intended, butintermediate results showed very poor mixing properties and problems with the stabilityof the estimated posteriors (section 4.2). Eventually the most simple idea, namely stateaugmentation (section 4.3), has proven to be the least worst solution, even if it suffersnumerous drawbacks. The chapter ends with a discussion on model checking and thedifficulty of parameter estimation for the model at hand.

4.1 Maximum Likelihood

The traditional approach to parameter estimation is Maximum Likelihood (ML). Giventhe specific markovian structure of the model, the likelihood of the observations given thevector of parameters θ can be decomposed as follows:

p(R1:T | θ) =T−1∏t=0

p(Rt+1 | R1:t,θ) (4.1)

This likelihood is actually available as a by-product of the filtering, each element in theproduct being the normalizing constant in equation 3.7. Following Doucet et al. (2001)and Werner et al. (2011), we can derive an estimate for the log-likelihood as follows in thenext series of equations.

27

28 Parameter estimation

p(Rt+1 | R1:t,θ) =

∫p(Rt+1 | Zt+1, R1:t) p(Zt+1 | R1:t) dZt+1 (4.2)

=

∫ ∫p(Rt+1 | Zt+1, R1:t) p(Zt+1 | Zt) p(Zt | R1:t) dZt dZt+1 (4.3)

≈N∑k=1

w(k)t

∫p(Rt+1 | Zt+1, R1:t) p(Zt+1 | Z(k)

t ) dZt+1 (4.4)

≈N∑k=1

w(k)t+1 (4.5)

where we have used the MC approximation for the filtering density p(Zt | R1:t). The

weights w(k)t+1 are the unnormalized weights of the particles. Therefore, the log-likelihood

can be estimated as:

l(θ) = log(p(R1:T | θ)) = log(T−1∏t=0

p(Rt+1 | R1:t,θ)) (4.6)

=T−1∑t=0

log(p(Rt+1 | R1:t,θ)) (4.7)

≈T−1∑t=0

log(N∑k=1

w(k)t+1) (4.8)

This estimate of the log-likelihood is straightforward to obtain from the filtering algorithm.It is nevertheless almost impossible to use it, because on the one hand it would requireto run a Particle Filter for each set of parameters, and on the other hand the resultingestimated likelihood function would be very jagged. With a parameter space such as ours,this precludes the direct use of ML methods. An example of its use is shown at the end ofsection 4.5. For a full application, one would need to compute the likelihood on a grid overthe full parameter space, or develop some kind of method approximating its derivative andapply any gradient-based optimization algorithm. No effort has been put in this line ofresearch though.

The main issue is actually that the likelihood depends not only on the observations andthe parameters, but also on the state of the system, which is not observed. It is thereforenecessary to integrate it out, using for example a MC approximation as we just did, leadingto a simulated ML method. The likelihood depends on the probability of a whole path ofthe system, from t equal 1 to T. As stated earlier, it is impossible to obtain a good sampleof path as time increases, due to the very high-dimensionality of the problem.

Another issue with using ML for continuous diffusion process is the discretization error.Durham and Gallant (2002) have proposed to simulate m intermediate values between theobservations to make the approximation error arbitrarily small. In our case, nonetheless,the ∆t is very small and there is no need to worry about this problem.

An interesting and recent idea to estimate parameters with ML is to use exact simulationtechniques using an algorithm that allows to simulate exactly a continuous diffusion process

4.2 Monte Carlo Markov Chain 29

(Beskos et al., 2006). The technique is very attractive from a theoretical point of view,and we spent some time trying to apply it to our case. Unfortunately it was impossible toextend it to such a multivariate model with hidden states, some conditions on the processbeing not fulfilled.

4.2 Monte Carlo Markov Chain

Monte Carlo Markov Chain (MCMC) methods are intended to sample from high dimensionalprobability distributions by constructing Markov chains that converge to the desireddistributions. In the context of parameter estimation, MCMC is a tool to sample fromthe posterior distribution of the parameters. It is a Bayesian approach, in the sense thatwe assume that the parameters are random variables. Previous knowledge or expertisecan be straightforwardly incorporated in the model through the design of well suited priordistributions.

To perform Bayesian inference in the case of state-space models, it is necessary to samplefrom the joint posterior of the state and the parameters. To do so, we can use a simpleGibbs sampler that alternates between sampling θ given the state, and sampling the stategiven θ and the observations.

In the present work, we followed the approach proposed in Eraker (2001). The method isgeneral enough for multivariate diffusion process with unobserved variables such as ours. Itwas more convenient to come back to the original formulation of the model, where the stateis Zt = (Yt, Xt, Qt), where Qt, the log-price, is perfectly observed and the other variables arehidden. While Eraker (2001) argues that intermediate data need to be simulated betweentwo observations in order to obtain a good approximation of the continuous diffusionprocess, in our case the ∆t is small enough to not suffer from this drawback.

The joint posterior of the state and the parameter is then given by:

p(Z1:T ,θ) ∝T∏t=1

p(Zt | Zt−1,θ) p(θ) (4.9)

where, because the stochastic parts of the diffusion are all Wiener processes with independentnormal increments:

p(Zt | Zt−1,θ) ∝∣∣∣Σ−1

∣∣∣1/2 exp

(−1

2(∆Zt − µt−1∆t)T Σ−1∆t−1 (∆Zt − µt−1∆t)

)(4.10)

and where µt and Σ are defined as follows:

µt = µ(Zt−1,θ) =

fNL(Xt−1, Yt−1)

yt−1

µf − σ2f/2 + Yt−1

Σ =

σ2y 0 0

0 σ2x σ2

x

0 σ2x (σx + σf )2

(4.11)

If we have a sample from the joint posterior 4.9, we can easily obtain a sample from theposterior of the parameters simply by discarding the state variables.


Sampling the state To design our Gibbs sampler we need to be able to sample thestate given the parameters and the observations. This is not straightforward, becausethe smoothing density is not available in closed form. One possible way is to sample thestate at each time step conditioned on the previous and the next states. This amounts tosimulate a diffusion bridge. The samples are clearly not independent but are expected toconverge to the distribution of interest. The diffusion bridge has the following distribution:

p(Zt | Zt−1, Zt+1,θ) ∝∣∣∣Σ−1

∣∣∣ (4.12)

exp

(− 1

2(∆Zt − µt−1∆t)T Σ−1∆t−1 (∆Zt − µt−1∆t)

− 1

2(∆Zt+1 − µt∆t)T Σ−1∆t−1 (∆Zt+1 − µt∆t)

)

This is derived directly by noting the following decomposition:

p(Zt | Zt−1, Zt+1,θ) ∝ p(Zt | Zt−1,θ) p(Zt+1 | Zt,θ) (4.13)

The distribution 4.13 has no reason to be normal and can exhibit strange features, includingbimodality. Eraker (2001) argues that for small ∆t we can simulate from it by using anaccept/reject algorithm with the following normal proposal distribution:

Zt | Zt−1, Zt+1,θ ∼ N(

1

2(Zt−1 − Zt+1) ,

1

2Σ∆t

)(4.14)

Our experiments show that this works well in our case, with a quite low rejection rate.

Sampling the parameters The Gibbs sampler for the joint posterior 4.9 requires stillto sample from the posterior of the parameters given the state. It is necessary to design aseparate Gibbs sampler for this step, because the joint distribution of the parameters isintractable. The posterior of interest is a big product of the likelihood times the prior. Ifthe functional shape of the likelihood for a given parameter is recognized as a standarddistribution, then one can choose an appropriate conjugate prior; otherwise it is in principlealways possible to use a general purpose Metropolis-Hastings algorithm.

The easiest parameters to handle are α and β because they affect linearly the distribution.By computing the product in equation 4.9 and completing the square it is possible to seethat the likelihood has a normal shape, centred at α and β respectively. A normal priorcan then be used.

We have implemented the resulting MCMC algorithm, with only α and β unknown, butthe results were disappointing, with no reasonable convergence. The possible explanationsthat we see for this failure are three-folds:

(a) the sampler for the state mixes too slowly, because the distribution of a state given itsprevious and next value is very narrow, especially for small ∆t.

4.3 State Augmentation 31

(b) The estimator of the mean and variance of the posterior of α and β has a too largevariance. It involves indeed average over terms in 1/X2

t ∆t2 and 1/Y 2t ∆t2 respectively,

and was shown empirically to exhibit very extreme values. Some more robust estimators,like the trimmed mean or the median, have been tried without success.

(c) The prior should include information about the market types. With the current MCMCsampler, it is indeed possible to jump from one type of market to another very easily,even to regions of the parameter space that produce explosive prices. One possibleremedy would be to create more specific priors that take this into account.

Given the problems encountered so far, even with only two unknown parameters, nofurther attempt has been done to make the method work. We focused instead on stateaugmentation, described in the next section.

4.3 State Augmentation

One way to solve the estimation problem is to simply augment the state by addingthe parameters as new unobserved hidden variables and then proceed to filtering. Thisprocedure suffers obviously from some drawbacks, but it has the advantage to be easy tounderstand and to apply. Let’s define the vector of parameters as θt. The time index doesnot mean that the parameters are time varying, but rather that it is the estimation of theparameter at time t. The new augmented state vector is now Zt = (Xt, Yt,θt). Henceforth,if θt is mentioned explicitly it means that Zt keeps its previous definition Zt = (Xt, Yt).

The increase of dimensions makes the filtering problem more difficult, because it is harderto explore the parameter space efficiently. A fairly large amount of particles is thereforeneeded. Particles are selected according to their usual state values and indirectly to theirparameters value, dictating the way they evolve. Mainly two obvious different strategiesare possible to adapt Particle Filtering to the augmented state:

• Fixed parameters

• Artificial dynamic

The second strategy can possibly include many refinements. Both of them are described inthe following subsections.

4.3.1 Fixed parameters

The most basic idea of state augmentation is to keep the parameters fixed for each particle.The procedure consists in generating randomly N particles according to some prior for thestate and the parameters. Then each particle is evolved using its own set of parameters.The selection at each step is influenced both by the state and indirectly by the parameters.Every specific Particle Filter algorithm described so far can be used. A fair amount ofparticles is nevertheless necessary if any success is expected. In term of state equations,very simply:

θt+1 = θt (4.15)

The main issue of this method is to choose some meaningful prior for the parameters. Thiswill be discussed in more details in a coming paragraph.


4.3.2 Artificial dynamic of parameters

A second approach is be to make the parameters evolve according to some artificial dynamic,even if we assumed that in reality the parameters are fixed. The procedure starts as above,but at each step the parameters of each particle are changed a bit. The question is of coursehow to make these changes. A first idea would be a random walk with a decreasing variance.The transition distribution of the parameter is then simply a Gaussian distribution center

at θ(k)t with decreasing variance. The state equation for the parameters would look as

follows:

θt+1 = θt + σt√

∆t ε (4.16)

where ε ∼ N (0, 1) and σt is a decreasing function of t. This idea will be tested and usedlater with a decreasing function such as:

σt = c

(1− t

(T + a)

)b+ d (4.17)

It will be used in particular associated with the fully adapted Particle Filter and referredto as the fully adapted dynamic Particle Filter.

It is possible to refine a bit this crude idea by designing more complex parameter dynamics.One of them is to approximate the posterior distribution p(θt+1 | R1:t,θt) of the parametersas a mixture of normals (West, 1993; Liu and West, 2001). We have basically a sample

of N θ(k)t , one for each particle, and estimate the posterior as a combination of them. To

do so, one has to sample the parameters from a multivariate normal distribution, with

mean m(k)t defined as a convex combination of the average θt and the specific value of the

parameter θ(k)t . The covariance St is likewise estimated from the empirical covariance of

the θ(k)t times a factor. This factor should intuitively be a decreasing function of time, but

Liu and West (2001) proposed a fixed value h defined through a discount factor δ around0.95− 0.99. The resulting transition density for the parameters is given as follows:

θ(k)t+1 | θ

(k)t ∼ N (m

(k)t , h2St) (4.18)

m(k)t = aθ

(k)t + (1− a)θt (4.19)

h2 = 1−(

3δ − 1)

2δ

)2

(4.20)

a2 = 1− h2 (4.21)

This artificial dynamic based on kernel density approximation has been implemented, butnot in an efficient way, and the results are not very convincing. Our current implementationdoes not allow to use more than a couple of hundreds particles, limiting its scope ofapplication to exploration study.

The model being very nonlinear and showing some bifurcations, all these approachesincluding some blind artificial dynamic can be dangerous, because one particle could jumpfrom one type of behaviour to something completely different, even for small change ofparameters values. Indeed, it would be necessary to include some knowledge about themarket types to design a more sensible dynamic for the parameters, nevertheless nothinghas been attempted in this direction.

4.4 Model checking 33

4.3.3 Priors

A critical aspect of the state augmentation method, in particular for the fixed parameterscheme but also with artificial dynamic, is the initialization of the parameters. One has todesign some sensible priors from which to sample the particles. In a first step, parameterscan be considered as independent and be sampled separately. We still need to assume somekind of distribution, plausible range of value, etc.

The parameters could also be sampled jointly. Taking into account the different markettypes appearing for different joint values of the parameters in the dynamical model, itwould make sense to sample the parameters according to some joint prior. For example,one could sample first from the set of market types and then sample the parameters valuesconditioned on the market type. The conditions on the parameters, however, are not allavailable in closed form, and this would require to use heuristic solutions.

The approach taken in this work is to sample most parameters independently with uniformdistribution over a reasonable range of value. Indeed the values giving rise to plausibledynamic are not so spread. We checked nevertheless that the estimates of parameters donot get stuck at the limits of the priors, in which case we increase their range as much asnecessary.

In more details, β being always negative, we sample it between -5 and 0, whereas it isthe opposite for B whose range is 0 to 5. µ and λ are sampled between 0 and 6, 0 beingexcluded. It would seem that sampling them on a log-scale would be smarter, but actuallyall important changes in the market types occur in this small range of values for theseparameters, hence this choice of prior.

The parameters α and A are sampled jointly uniformly. Indeed it was necessary to avoidthe case where they are both positive, because leading to unsustainable markets. Therefore,we sample first an index between 1, 2 and 3, selecting one of the possible combinationof signs, and then we sample the parameters on their limited range of values. For α thisrange is between -5 and 0, or 0 and 5, and A is sampled between -12 and 0, or 0 and 12,depending on the case. These settings will be used for the simulation study as well as forthe analysis of the DJIA data in the last chapter.

Clearly more thought could be put to improve these prior distributions. As alreadymentioned, it would be sensible to sample all parameters jointly, because some combinationslead to completely different market types. One could put some prior on the type of market,maybe also dependent on external information, and sample then from the individualdistributions. Nevertheless, the algorithms are hoped not to be too dependent on the initialvalues, at least if a sufficiently large space is sampled and if the number of particles is bigenough. It would be a good mean, however, to include some external information in theprocess.

4.4 Model checking

So far we have focused on ways to estimate the parameters of the model. Different methodshave been exposed and will be tested in the next chapter and applied to a real casestudy at the end of the thesis. When data are simulated, the true value of the state andthe parameters is known and it is then possible to check the performance of the model


straightforwardly. When one is interested in a real application, however, these quantitiesare unknown. It becomes thus critical to find a way to test the adequacy of the model.

A basic idea to do so is to look at the predictive distribution of the returns and see how theactual returns correspond to it. First let’s notice that the predictive density distributioncan be written as follows:

p(Rt+1 | R1:t) =

∫p(Rt+1 | Zt) p(Zt | R1:t) (4.22)

≈N∑k=1

w(k)t p(Rt+1 | Z(k)

t ) (4.23)

where we used the MC approximation (equation 3.8) of the filtering density provided bythe Particle Filter. The distribution of Rt+1 given the previous state has been derived inthe section on the fully adapted Particle Filter (section 3.3.2) and is repeated here for moreclarity:

Rt+1 | Zt ∼ N(Yt∆t+ µf∆t, (σ2

f + σ2x)∆t

)(4.24)

If we substitute equation 4.24 in equation 4.23, the distribution of interest is a mixture ofnormals. The probability distribution function is simply obtained by integration. Let’scall F the true predictive distribution corresponding to 4.22 and the estimated predictivedistribution FMC , corresponding to equation 4.23. If we write the standard normaldistribution with zero mean and unit variance as Φ , FMC becomes as follows:

FMC(rt+1 | R1:t) =N∑k=1

w(k)t Φ

(rt+1 − (Y

(k)t + µf )∆t

(σ2f + σ2

x)∆t

)(4.25)

Now the idea for model checking is based on the basic fact that if X ∼ F , then F (X) ∼Unif(0, 1). The true F is not available, but we have an MC approximation for it. Theobserved log-returns R1:t are supposed to follow the law of F . Then, if the estimatedpredictive distribution is correct, FMC(Rt | R1:t) should be uniformly distributed between0 and 1.

This can be easily checked by looking at the histogram of FMC(Rt | R1:t). If it is uniform,then everything is fine. If it has a U shape, it means that the true F is broader than theestimated one and the model is said to be under-dispersed. Inversely, if the histogram hasa concave shape, then the true F is narrower and the model is said to be over-dispersed.The presence of slopes would indicate any systematic bias in the estimation.

We will apply this idea briefly in the chapter on simulation study and use it in theapplication to real data to check the quality of the modelling. Nevertheless, it should bementioned that this test is only a first step towards a full procedure for model testing andmore should be done in this direction.

4.5 Why is the problem so hard? 35

4.5 Why is the problem so hard?

As we will see in the chapter on simulation study, the methods described so far are not asefficient as one could expect, especially to estimate the parameters. Actually, in a lot of cases,the estimates are quite wrong and do not converge to a solution. Different explanationscan come to mind for this issue, some of them potentially leading to improvements of themethods.

To start with, estimating parameters of a continuous diffusion process with discreteobservations is known to be hard. In the case of a state-space model with continuousparameter space as well. An extensive literature has been devoted to this issue and there isno magical solution out there to solve every problems. The difficulties are even enhancedwith a multivariate diffusion only partially observed, or otherwise stated, with some hiddenvariables. Indeed, it is necessary to perform inference on both the hidden states and theparameters simultaneously.

Second-order and nonlinearity More specifically to our case, the system of stochasticdifferential equations describing the dynamic can be seen to be of second-order. It is thesecond derivative of the price and of the mispricing that depends on the function fNL(x, y).It means that the parameters do not influence directly their value x or p, nor their rateof change or simple differences like returns, but the way itself the returns vary. This is afeature of the model explicitly intended to represent the speculative behaviour of traders,but at the same time this makes obviously the model much harder to estimate. Furthermore,the parameters influence the dynamic in a very nonlinear way, which increases the problemof second-order. Taking the returns as observations instead of the price itself was alreadyone step in this direction, making easier the estimation of Yt.

It seems that to tackle this issue fully, it is necessary to look at more than one return at atime, to get information about their rate of change. It would be actually an argument todiscard the methods based on state augmentation, which select particles at each time stepbased on the current observation, and focus on ML and MCMC, that are batch methodstaking into account the whole history of observations. Nevertheless, other issues have beenshown to arise with the latter.

Two main solutions come to mind to address this problem using Particle Filter and stateaugmentation:

• Design an artificial dynamic of the parameters that takes into account variation ofreturns, or returns of returns.

• With the auxiliary Particle Filter, use a proposal that looks more than one returnahead and thus bases its selection on variation of returns as well.

Some attempts have been done to implement the first idea but with no success at all. Thevariations of returns is indeed much too stochastic to be used directly. Even by smoothingover several steps, the resulting dynamics do not improve the results, on the contrary. Thesecond idea has not been applied because it is not so straightforward to implement. Thequestion is how to design the proposal. One could use a point estimate for the evolutionof state, but it is known to be a bad solution. More could be developed on this line ofresearch.


Identifiability and market types And last but not least, it seems that the parametersare not completely identifiable. Actually if one look at some simulations with differentparameter values, it is sometime difficult to notice any difference. This comes from thenonlinearity and the second-order nature of the dynamic. An example is shown in figure4.1 that shows three simulated daily prices over a period of ten years. The sequence ofrandom number is the same, only the parameters α and A are modified. Of course thereare some differences between (a) and (b), but honestly they are not striking, and it can beunderstood why the methods have some trouble to converge to the true parameter value.On the other hand, some tiny change of parameters can create big difference in dynamic,for example in (c) α is changed slightly, creating a dynamic with no finite attractor andleading thus to explosive prices.

We studied in more details the case displayed in figure 4.1 b) by computing the likeli-hood for the parameters α and A, using the estimator developed in section 4.1: l(θ) ≈∑T−1t=0 log(

∑Nk=1 w

(k)t+1). A Particle Filter with 5 000 particles is run for each tested parameter

value, while all the others are kept to their true value. This procedure is computationallyvery expensive and took approximately two days of calculation. This is clearly not applica-ble in practice, but we tried it for the sake of illustration and for gaining some insight onsome cross-section of the likelihood function.

The resulting log-likelihood sections are displayed in figure 4.2. Interestingly, the likelihoodof the parameter α seems to be more or less well defined, with a maximum value not toofar from the true parameter shown in red. It is nevertheless puzzling that the likelihooddoes not drop steeply after zero, because such markets are actually explosive as the onein figure 4.1 c) and should then have a very small likelihood. To inquiry this further, weproceed then to the filtering of the price (third panel of figure 4.2) once with the trueparameter value (in blue) and once with the parameters corresponding to the explosivemarket (in red). The latter case is clearly much worse, with a too wild dynamic, which is abit comforting.

The log-likelihood of the parameter A, on the other hand, is much less smooth. Furthermore,it is slightly bimodal, seeming to indicate that A should be away from zero, but could bepositive as well as negative. Thus it looks quite difficult to estimate A in this case. Otherexperiments with the likelihood function have been applied to other types of markets. Inmany cases, degeneracy has been identified, with either some bimodal feature like for A, orwith a very flat region for a large range of parameter bordered by an abrupt drop of thelikelihood, indicating that many values are equally possible.

In such a context, it looks unrealistic to want to estimate the parameters precisely. Oneshould maybe focus more on identifying the market types, defined over a broad rangeof parameters. For example identify if we are in the case of a collective and individualmean-reverting market or a collective mean-reverting and individual speculative market.These market types connect to features of the dynamical system that are more likely to beidentifiable, like the existence of different attractors or limit cycles.

4.5 Why is the problem so hard? 37

5.0

6.0

7.0

8.0

(a)

5.5

6.5

7.5

(b)

2002 2004 2006 2008 2010

time

2040

6080

100

(c)

Figure 4.1: Simulated stock log-prices with different values of some parameters but similartrajectories for (a) and (b) and explosive behaviour for (c). (a) α = −1, A = −10, (b)α = −1, A = −1 and (c) α = 1/2, A = −1


−10 −5 0 5 10

4238

442

388

4239

2

A

Log−

Like

lihoo

d

A=−1

−6 −4 −2 0 2

4236

042

380

α

Log−

Like

lihoo

d

α = − 1

2002 2004 2006 2008 2010

−3

−2

−1

01

X

Figure 4.2: In the first two panels, the simulated log-likelihood of the data in figure 4.1b)for parameters A and α is displayed, using the method described in section 4.1 with 5 000particles.The third panel shows the filtering of X with for the blue line the true parameters and forthe red line the parameters of figure 4.1c).

Chapter 5

Simulations

In order to test the performance of the methods for filtering and parameter estimation,an extensive set of simulations have been run. For illustration, we start by showing anexample of filtering of the state of the system (section 5.2). Nevertheless, the main point weare interested in is the estimation of the parameters. Therefore, in a second step, we focuson the ability of different methods to estimate the parameters. We show first a detailedexample (section 5.3) before presenting the results of extensive simulations (section 5.4).In the last part, identifiability of the market types is tested through a likelihood method(section 5.5). Additionally, the technique for model checking based on the predictivedistribution is illustrated (section 5.6).

5.1 Market types

Six different market types have been chosen, whose parameter value are displayed in table5.1. They are a subsample of the market types presented in Yukalov et al. (2009). Theiridentifier correspond to the number of the figure in the paper, where it is possible to see theassociated phase portrait of the corresponding deterministic system and a more detailedexplanation of their dynamics.

They have been chosen to cover a fair variety of markets with different features. Thesection 2.1.3 describes in more details the interpretation of each parameters. Market types1 and 5 represent some equilibrium markets, with individual and collective mean-revertingstrategies and different level of uncertainty and regulation. They are the most simplemarket types and have only one fixed point at zero mispricing, meaning that the stockprice varies basically around its fundamental value.

The types 9 and 10 are more complicated, with individual speculative traders balanced bya collective mean-reversal strategy. They vary in the level of regulation. Two attractorsare associated with their dynamic, with a positive and a negative convention and somelimiting cycles.

The market types 15 and 18 are somehow reversed, with individual mean-reverting andcollective speculative behaviours. This could represent situation where some groups oftraders can create big changes in the market with their speculations. They are different inthe level of regulation and uncertainty. Different from the types 9 and 10, they have three

39

40 Simulations

stable fixed points: one positive, one negative and one neutral convention. The type 18shows furthermore some complex limiting cycles.

market α A β B µ λ

1 -1 -10 -1 1 1 15 -1 -10 -1 1 2 29 5 -1 -1 1 5 2

10 5 -1 -1 1 5 315 -1 1 -1 1 2 118 -1 1 -1 1 3 3

Table 5.1: Parameter value for different market types

5.2 Filtering of the state

An example of the filtering of states when the parameters are assumed to be known isgiven in figure 5.1. Daily data are simulated over a period of 15 years with parameterscorresponding to a simple strictly regulated market with individual and collective mean-reverting agents and low uncertainty (type 1 on table 5.1). The basic SIR Particle Filteris used with 1 000 particles initialized uniformly randomly between −3 and +3 for X0

and Y0. The black line shows the simulated trajectory and the red line represents thefiltering, while in light blue a 90% confidence interval is depicted. Whereas the ParticleFilter actually provides information on the whole filtering distribution, we displayed hereonly some summary statistics for illustration purpose. In the next example we will look atthe full density of the particles. The weighted samples approximation is used to computethe moments of the distribution, because it has less variance.

The filtering appears to be of good quality and the coverage of the confidence is quite good.At the beginning it varies a lot, because it needs to eliminate many particles that are toofar from the true values. The actual price is not shown, because there is no need to filterit, being perfectly observed.

In figure 5.2 it is possible to see the density of the particles for the hidden variable X alongtime. The data simulated correspond to a market type with individual mean-revertingand collective speculative market (type 15 in table 5.1). In black the true trajectory isshown. The dynamic possesses three attractors, one positive, one negative and a neutralone. It is interesting to notice how most of the particles are concentrated around the truetrajectory, but a fair amount of them is actually exploring the negative convention. Thisdemonstrates that in this case, this configuration is not negligible and could be the rightone. This example serves also as a caveat against using only the first few moments ofthe filtering density. Indeed in such a case, the mean value would be between the twotrajectories, leading to a wrong interpretation of the underlying dynamic.

5.3 Augmented filtering

In this section, the ability of the Particle Filter to estimate the parameters is tested. Again,1 000 particles are used to approximate the distribution. While it was enough in the

5.3 Augmented filtering 41

1995 2000 2005

−2.

0−

1.0

0.0

1.0

Y

1995 2000 2005

−0.

50.

00.

51.

0

X

1995 2000 2005

5.5

6.5

7.5

Qf = Q − X

Figure 5.1: Filtering of the state with SIR Particle Filter using 1 000 particles. In redthe mean estimate, with in light blue a 90% confidence interval. The data (in black) aresimulated according to the market type 1.

42 Simulations

0 5 10 15 20

−4

−3

−2

−1

01

23

Time

X

Figure 5.2: Density plot of the particles for variable X. A kernel smoother with weighteddata is applied at each time step and normalized for ease of visualization. The black line isthe real trajectory. The data are simulated according to the market type 15.

previous case where parameters are known, it is actually very likely to be insufficient inthe present case. However, if we want to display the evolution of the estimators with time,we need to restrict ourselves to a small number of particles, because otherwise the storagerequirement becomes too overhelming. Nevertheless, for further extensive study, we use upto 10 000 particles and stored only the final value.

Figure 5.3 shows the filtering of parameters for the same simulation setting as in the lastexample (figure 5.1). The scheme used for the state augmentation is with fixed parametersas described in section 4.3.1. The plain black line is the evolution of the mean estimate ofthe normal SIR Particle Filter, which can be seen to suffer two main drawbacks:

• the estimators do not converge to the true value.

• sample depletion: after a few iteration, only a small subset of original particles survive.This is visible because the estimation stops to vary.

The fully adapted Particle Filter (dashed line in figure 5.3), on the other hand, describedin section 3.3.2, shows better performances. The problem of sample depletion is clearlyalleviated, while still present, and the parameters seem to converge better to the true value.In particular, the signs of the parameter α and A are correctly recovered, which is a crucialfeature of the market type. But, to be honest, this is a lucky strike and reality is not alwaysso nice.

Figure 5.4 shows the result of parameter filtering applied to the first 15 years of the datasimulated in section 2.3 (market type 15 in table 5.1). In this case we see that the resultsare less good than in the previous example. In particular, both methods (normal SIR andfully adapted Particle Filters) estimate the parameter A to be negative, even though it isactually slightly positive. This problem will be found to be recurrent for this market typein the next section on extensive simulation analysis.

5.3 Augmented filtering 43

1995 2000 2005

−3

−1

12

α

1995 2000 2005

−10

−6

−2

2

A

1995 2000 2005

−4.

0−

2.5

−1.

0

β

1995 2000 2005

1.0

2.0

3.0

B

1995 2000 2005

12

34

5

µ

1995 2000 2005

12

34

5

λ

Figure 5.3: Market type 1. Filtering of parameters with normal SIR (plain line) and fullyadapted Particle Filter (dashed line) using 1 000 particles. The red line indicates the truevalue of the parameters.

44 Simulations

1965 1970 1975

−3

−2

−1

0

α

1965 1970 1975−

10−

6−

22

A

1965 1970 1975

−3.

5−

2.5

−1.

5−

0.5

β

1965 1970 1975

1.0

2.0

3.0

4.0

B

1965 1970 1975

12

34

5

µ

1965 1970 1975

12

34

λ

Figure 5.4: Market type 15. Filtering of parameters with normal SIR (plain line) and fullyadapted Particle Filter (dashed line) using 1 000 particles. The red line indicates the truevalue of the parameters.

5.4 Extensive simulation study 45

5.4 Extensive simulation study

To test the methods ability to recover the parameters, we designed some extensive simulationstudy. The methods tested are the normal SIR Particle Filter (algorithm 1), the fullyadapted Particle Filter (section 3.3.2) and the fully adapted dynamic Particle Filter(section 4.3.2) with a modified random walk (equation 4.17). We restrict ourselves to these3 methods, even if other variations have been implemented, like the auxiliary particle filterwith point estimates (algorithm 4) and artificial dynamic with kernel smoothing (section4.3.2), because these latter are computationally much heavier and did not produce betterresults in our exploration study.

Daily prices are simulated for the six different market types over a period of 15 years.The parameters of the underlying Geometric Brownian Motion are chosen similarly as insection 2.3 whereas the standard deviation of the drift process of the mispricing is chosensmaller (σy = σx = σf ). For this range of values, the dynamic is still interesting whilebeing slightly more manageable.

5.4.1 Multiple realizations

In a first set of simulations, we generate 50 different realizations of stocks prices for each ofthe market types. Then the 3 methods are run once on each data set. This is necessaryto assess the power of the methods in many different situations. Indeed one could pickby chance a specific time series where the method succeed well, but would actually fail inother cases, as in the examples of the preceding section 5.3. To assess the power of themethods, we compare the estimated parameters to the true known value by computingthe root mean squared error (RMSE). Furthermore, the recovering rate of the sign of α, Aand both at the same time is also shown, because they define the general behaviour of themarket.

The results displayed in table 5.3 are a bit disappointing. As a first remark, the errors seemto be big in most cases and for most parameters. While the market types 1, 5, 15 and 18are still reasonably well estimated, the performances on markets 9 and 10 are really poor.A possible explanation is that they do not have an attractor at an equilibrium position andusually exhibit strong periodic variations, almost deterministic. It is nevertheless puzzlingthat a case that is more driven by its deterministic part than its stochastic component isactually harder to estimate.

For market types 1 and 5, the errors can look big, but it depends on what degree ofprecision we hope to achieve. The explorations done in section 4.5 tend to show thatonly a rough estimate of the parameter values can be expected. With this in mind, wecan conclude that the results are convincing. The fully adapted Particle Filter, with andwithout dynamic, is in particular able to identify correctly the sign of the key parametersα and A in more than 80% of the cases, which is a significant achievement.

Similar performances are reached for the market types 15 and 18 in terms of errorsamplitude. The sign of A, on the other hand, is pretty badly recovered, with a recoveryrate worse than chance. This can possibly come from the fact that these market typeshave a fixed point at zero. If the mispricing trajectory does not jump to one of the otherconventions, or only rarely, it might be difficult for the method to notice the differencewith a market at equilibrium, where A is negative.

46 Simulations

5.4.2 Consistency of the estimator

In a second set of simulations, we test the consistency of the algorithms. Particle Filtermethods are indeed dependent on the initialization of the particles and in general on theseed used. Therefore, in this set-up we simulate one realization of the stock prices foreach of the market types, then the 3 methods are run 50 times with a different seed. It isexpected that the estimated parameters do not fluctuate too much.

The boxplot in figure 5.5 shows the distribution of the parameter estimates for the markettype 1. As a first remark, the spread of the estimates is relatively wide, especially for theparameter A. Nevertheless, by recalling the discussion of section 4.5, it is clear that wecannot expect the estimates to be very precise and in particular it seems that the likelihoodfor the parameter A is often degenerate (with a very wide maximum, or multiple maxima).

It is interesting to notice how the fully adapted Particle Filter improves over the normalSIR algorithm: the estimates have a much more concentrated distribution and their meanvalues are better for all parameters, except for β, where they are almost the same. Thefully adapted dynamic Particle Filter seems to have a tendency to exacerbate some featuresby pushing some parameter values farther than necessary, like β. Interestingly, it estimatesB to be negative, which is leading to an unstable market and is forbidden for the othermethods where the prior on this parameter is restricted to positive values. Its varianceis also obviously bigger compared to the fully adapted Particle Filter. At least in thisparticular case, it seems that A is the most difficult parameter to estimate.

For reason of space constraints, it is not possible to display all the corresponding boxplots.Therefore the results of the simulations for every market types have been summarized intable 5.4 where the standard deviation of the estimates over the 50 repetitions is shown.

Similarly to the market type 1 explored in more details, the variance of the fully adaptedParticle filter is almost always lower than the other ones. The cases 9 and 10 are slightlyworse than the others, but the discrepancy is not as pronounced as when looking at manyrealizations like in the previous section. This seems to point that the RMSE of thesemethods is mainly influenced by the bias term for these two market types.

5.5 Identifiability

Some problems pointed out at the end of the previous chapter in section 4.5 merit to betested. In particular we want to see to what extent a given market type is identifiable. Todo so, we designed a basic experiment where data are simulated with parameter valuescorresponding to each of the 6 tested market types. Then a Particle Filter with knownparameters is ran on the data and the log-likelihood is estimated with the method describedin 4.1. We train such a Particle Filter with each of the 6 market types parameters. Inother words, we look at the likelihood of some data given different sets of parameters, oneof them being the true one. The number of particles is 5 000 particles, providing hopefullya fair estimate of the likelihood, keeping in mind all the caveat that have already beenmentioned about it.

The results of the experiments are displayed in table 5.2. The rows correspond to the truemarket type used to simulate the data, while the columns indicate which set of parametershas been used to perform the filtering. For each experiment, the log-likelihood is given.

5.5 Identifiability 47

−10

−5

0

5

10

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

α A β B µ λparameter

valu

e

normal SIR fully adapted fully adapted dynamic

Figure 5.5: Boxplot of estimated parameters for market type 1 and three different methods.

48 Simulations

In bold face are shown the maximum likelihood of each row. Of course, we expect that themaximal log-likelihood occurs when the true parameters are used in the filtering, so on thediagonal. We can see in the table 5.2 that it is not always the case. When the true markettype is 5, the type 1 gets a slightly better likelihood score. The same happen for the types15 and 18. The types 9 and 10, on the other hand, have a clearly defined maximum at theirtrue position. It is interesting that the market types for which the augmented filtering hadthe more troubles are better defined in terms of likelihood. A possible explanation for thisfact is that markets 1 and 5, and 15 and 18 respectively, are not well distinguished onefrom the other. Their dynamic is quite similar, with the same main features. And actually,in the previous experiments we did not check if estimates of market type 1 were close tothe parameters of market type 5 for example. So it is likely that such confusions occurredbefore. What was clear was that the signs of α and A were correctly recovered, which isnot in contradiction with the present result of confusion between neighbour markets. Itis still interesting to notice the the likelihood estimate of market type 9 and 10 is wellbehaved, while no algorithm was able to estimate their parameters correctly.

1 5 9 10 15 18

1 56 497 56 494 56 485 56 485 56 495 56 4955 56 496 56 493 56 484 56 485 56 492 56 4929 56 202 53 359 56 488 51 987 56 068 50 81010 53 782 34 591 54 703 56 487 52 566 49 37215 56 496 56 495 56 489 56 486 56 498 56 50018 56 496 56 494 56 488 56 487 56 500 56 500

Table 5.2: Log-Likelihood estimates for different combinations of market types. The rowsare the true market types from where the data come, while the columns are the parametersused for the filtering. In bold face, the maximum of each row is indicated.

5.6 Model checking

In this section we apply briefly the method for model checking described in section 4.4. Wesimulate data from the market type 15 and apply the fully adapted Particle Filter. Thepredictive distribution function is obtained easily as a by-product of the filtering. Thenwe compute for every observation FMC(Rt | R1:t) and look at the resulting histogram(figure 5.6). The result is quite convincing, with a distribution apparently uniform andwith no systematic departure. Similar conclusions have been drawn from experiments onother market types but are not displayed here. It will be interesting to compare this resultto the real case study in the next chapter.

5.7 Conclusion of simulations

The simulations confirm that the problem of estimating parameters is indeed very hard.The performances of the algorithms are globally poor if one is interested in estimatingprecisely the parameter value. On the other hand, if the focus is more on the market typeidentification, the results are more convincing, especially when looking at the recoveringrate of the signs of α and A.

5.7 Conclusion of simulations 49

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 5.6: Histogram of FMC(Rt+1 | R1:t) for data simulated according to the markettype 15. It should look like a uniform distribution.

The types 9 and 10, characterizing markets with individual speculative and global mean-reverting behaviours are clearly the most difficult to estimate. The reasons for this areunclear, because their dynamic is actually mainly governed by their deterministic component.A possible reason could be that the mispricing trajectories tend to stay trapped in thelimiting cycles surrounding the attractors, but do not jump often from one attractor toanother.

The market types 15 and 18 are overall better estimated than the types 9 and 10, butthe parameter A is systematically wrong. The proposed reason is that the mispricingcan fluctuates for long periods of time around the neutral attractor, making the dynamicundistinguishable from a market at equilibrium characterized by A > 0. It is likely that byincreasing the length of the time series under study, and thus by increasing the chance ofjumps to a positive or negative convention, it would be possible to increase the performancesof the method.

On the other hand, the market types 1 and 5 are the easiest to estimate, but they are aswell the least interesting. Indeed they have a dynamic close to equilibrium and do notdisplay very interesting features. It is nevertheless a good result that the methods seem towork relatively well in this simple case, especially for recovering the signs of α and A.

Overall, it seems that the fully adapted Particle Filter has better performances than theother algorithms. Furthermore, because it has more even weights, it should in principle bethe one suffering the less of sample depletion and then have a tendency to produce widelyspread estimates for the parameters. It is then encouraging to notice that it is the one withthe better concentration of parameter estimates. In the next chapter, when real financialdata will be analysed, the fully adapted Particle Filter will thus be used systematically.

50 Simulations

market

meth

od

αA

βB

µλ

αsign

Asign

both

sign

norm

al

SIR

2.458.86

1.832.01

2.772.75

0.620.70

0.341

fully

ad

apted

0.938.73

1.701.06

2.011.78

0.960.78

0.74fu

llyad

apted

dynam

ic2.86

5.328.08

4.833.25

2.290.94

0.960.90

norm

al

SIR

2.478.89

2.122.02

1.881.81

0.580.66

0.305

fully

ad

apted

1.567.44

1.661.39

1.461.01

0.840.88

0.72fu

llyad

apted

dynam

ic3.13

5.277.15

4.442.52

1.280.94

0.960.90

norm

al

SIR

5.885.94

2.142.07

2.482.15

0.380.68

0.369

fully

ad

apted

7.255.79

1.762.21

3.110.98

0.260.74

0.26fu

llyad

apted

dynam

ic9.80

5.091.67

3.623.41

0.550.00

0.440.00

norm

al

SIR

6.556.31

2.051.84

2.621.68

0.320.62

0.3210

fully

adap

ted7.20

5.852.22

1.782.73

1.330.19

0.500.19

fully

ad

apted

dynam

ic9.58

9.602.35

0.743.65

0.350.00

0.020.00

norm

al

SIR

2.306.25

2.031.94

1.962.46

0.700.28

0.2815

fully

adap

ted0.72

3.352.06

1.130.98

1.850.90

0.180.18

fully

ad

apted

dynam

ic2.70

6.0910.03

4.912.01

2.440.80

0.000.00

norm

al

SIR

2.146.44

1.781.98

1.461.35

0.620.22

0.2018

fully

adap

ted0.70

3.082.14

1.350.86

0.690.92

0.170.17

fully

ad

apted

dynam

ic2.55

6.469.85

4.791.05

0.630.88

0.080.08

Tab

le5.3:

RM

SE

fordiff

erent

market

typ

esan

dm

ethods.

The

last3

colum

ns

show

the

recovering

rateof

the

signof

respectively

α,A

and

ofb

oth

,w

hich

determ

ines

the

genera

lty

pe

of

market.

5.7 Conclusion of simulations 51

market method sd(α) sd(A) sd(β) sd(B) sd(µ) sd(λ)

1 normal SIR 2.11 5.54 1.12 1.05 1.51 1.291 fully adapted 0.56 1.13 0.30 0.43 0.37 0.401 fully adapted dynamic 0.48 1.02 0.36 0.40 0.32 0.21






Table 5.4: Standard deviation of estimates over 50 repetitions for different market typesand methods.

52 Simulations

Chapter 6

Application to financial data

Now that the methods for filtering and parameter estimation have been presented andtested on simulated data, it is time to actually try them on a real case study. The financialtime series that has been analysed is the Dow Jones Industrial Average (DJIA), displayedin figure 6.1, from 1970 to the beginning of 2012.

The chapter begins with a discussion on the critical aspect of selecting the hyper-parameters(section 6.1). Then the parameters are estimated (section 6.2) and the model is brieflychecked (section 6.3). Given the estimated parameters, the features of the associateddeterministic dynamical model are analysed (section 6.4). The filtering of the state of thesystem is then performed (section 6.5) before ending the chapter with an interpretation ofthe dynamic in terms of financial events (section 6.6).

6.1 Hyper-parameters

There are a couple of parameters that are not estimated through the Particle Filter but needto be specified beforehand. They are namely the variance terms of the Wiener processes(σf , σy and σx), the drift of the fundamental price (µf ) and the characteristic time scale ofthe dynamic.

As discussed in section 2.3, the observed volatility of the stock price is influenced by boththe variance of the mispricing X and the fundamental price pf . If we estimate the volatilityfrom the observed data as the standard deviation of the log-returns Rt, it remains onlyto choose what fraction comes from the fundamental price and from the mispricing. Inour simulations, we fixed it to 1/2, but in the present analysis of the DJIA, we decided togive less importance to the fundamental price and to put more focus on identifying theunderlying mispricing dynamic. More precisely:

σf =

√1

10∆tVar(Rt) (6.1)

σx =

√9

10∆tVar(Rt) (6.2)

The variance term of Y is more difficult to set, because its influence is more indirect. Someexperiments tend to show that the outcome of the filtering is not too dependent on its

53

54 Application to financial data

1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010

1000

2000

5000

Figure 6.1: Daily data of the Dow Jones Industrial Average index from 1970 to 2011 inlogarithmic scale.

value in a reasonable range, so we fixed it to σy = σx. The drift term of the fundamentalprice, on the other hand, is set to the estimated averaged annual return, or more precisely:

µf = 250 Rt (6.3)

The characteristic time scale of the dynamic has been fixed to 1 year, or approximately250 days, based on qualitative considerations on the aspect of simulated stock prices. Itcould be changed easily, though, by adapting the values of ∆t and by scaling appropriatelythe parameters defined above.

The way we tuned theses hyper-parameters is of course somehow arbitrary. It is neverthelessnot very clear how to proceed otherwise. A means to check the adequacy of the modelfor the data is strongly needed. So far we have dealt only with simulated data, where thetrue state of the system and the parameters were known, and therefore it was possible toavoid this critical question. Having arisen only at the end of this work, this issue has notbeen fully addressed and thus mainly qualitative considerations are taken into account.Nevertheless one test has been done and will be discussed in the corresponding section.

6.2 Parameter estimation

To estimate the parameters of the dynamical system, we used the fully adapted ParticleFilter with 20 000 particles. With such a huge number of particles it is not possible to storethe whole evolution of the filtering, and only the last step is available. Nevertheless, thealgorithm usually stabilizes after some time, and we can hope that this estimate is valid.

The algorithm has been run several times to ensure a robust estimation. Figure 6.2 displaysthe boxplot of the estimated parameters. It is interesting to notice that the estimates

6.3 Model checking 55

−4

−2

0

2

4

●

●

●●

●

●

α A β B µ λ

Figure 6.2: Boxplot of estimated parameters for the DJIA data.

converge remarkably well. A is clearly negative, as well as β. On the other hand, α seemsto vary around zero. Even though the mean of α is slightly negative, we decided to pickup a case where it is estimated as positive, because it leads to a more interesting dynamic.

The market type is therefore characterized by slightly speculative individual strategies anda mean-reverting global behaviour. The levels of regulation and uncertainty are medium.The estimated parameters are summarized in table 6.1.

α A β B µ λ

0.15 -2.67 -4.11 2.18 2.15 2.80

Table 6.1: Estimated parameters of the DJIA dynamic.

6.3 Model checking

We have estimated the model’s parameters, but what tells us how well it explains the data?It is always possible to find an estimate, but it is more difficult to quantify how far themodel is from the reality. As explained above, this critical step has not been exploredfully and should merit further consideration. We can nevertheless proceed to the checkbased on the predictive distribution of the log-returns, which is described in section 4.4 .The resulting histogram is displayed in figure 6.3. When compared to the one obtainedfrom simulated data in 5.6, this one does not look good at all. Indeed, the distributionis far from uniform. The concave shape indicates that the true predictive distribution ofthe returns is narrower than the empirical one, meaning that our estimate is too broad.The opposite effect was expected, given the tendency of Particle Filter to degenerate andexplore only a small portion of the space.


0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Figure 6.3: Histogram of FMC(Rt+1 | R1:t) for the DJIA data. If the model was adequate,it would look like a uniform distribution.

It is difficult to disentangle the possible causes for this. Indeed, while in our modelwe assume that the fundamental price follows a GBM, this could well not be the case.Empirical evidence seem to show that the tails of the distribution are much wider andthat the volatility is clustered in time. While the first aspect could be accounted in themispricing dynamic, our model does not introduce any kind of heteroscedasticity. Anotherdistribution of the returns of the fundamental price could as well be the origin of theproblem. Indeed, if we look at the distribution of the returns of the estimated fundamentalprice (as explained in section 6.5) they are clearly not normal. A second hypothesis isthat the model includes the coexistence of different modes, or attractors, at the same time,producing a wider predictive distribution than the one observed. In this case, it would notmean that the model is wrong, but more that the time series is not long enough.

6.4 Deterministic dynamical model

If one looks at the phase portrait of the deterministic dynamical system generated by theselected parameters (figure 6.4), interesting features can be observed. It seems that thereare two conventions apart from zero, but the attractors are actually not very well definedand look more like a wide flat zone around zero. On the other hand, a large limiting cyclesurrounds the region. This dynamic is similar to the market type 10 in table 5.1, but theattractors are more fuzzy and less distant, leading to a dynamic closer to equilibrium. Thedynamic of the estimated system is between an equilibrium market (type 1 or 5) and amarket with positive and negative conventions (type 9 or 10).

6.5 Filtering of the state

As we said before, running the Particle Filter with a huge number of particles preventsthe storage of the evolution of the filtering. While it was not a problem to estimatethe parameters, it is clearly an issue when one wants to recover the state of the system.

6.5 Filtering of the state 57

−4 −2 0 2 4

−4

−2

02

4

X

Y

Figure 6.4: Phase portrait of the estimated dynamic system.


Furthermore, when the filtering is done on an augmented state, including the parameters,the estimation of the state of the system is likely to be less good than when the parametersare known. Therefore the fully adapted Particle Filter is run one more time on the data,this time using as parameters the estimates obtained in the first stage, and with only 1 000particles.

The beginning of the filtering is characterized by large fluctuations until the bad particlesare eliminated. For this reason we remove a burning period of 5 years at the start of theseries. The filtering of Y and X is visible in the first two panels of figure 6.5 together witha 90% confidence interval. More interpretation of the filtered mispricing will be given insection 6.6.

Even if in our model formulation the fundamental price plays only the role of a noisycomponent, it is still interesting to recover it. This is easily done by substracting theestimated mispricing from the observed price. The third panel of figure 6.5 shows theDJIA in black and the estimated fundamental price in red. The scale is again logarithmicfor better readability. It is interesting to notice how the steady evolution of the price,characteristic of a GBM with constant drift, is well modelled by the fundamental price,while the observed price deviates from it. A valid question that should arise at this pointis what would have happened if the volatility of the fundamental price was set to a highervalue compared to the mispricing dynamic. Probably some of the deviations observed couldbe well explained by a GBM with higher variance, nevertheless the provided model is moreinteresting in term of interpretation and gives possible clues on the dynamic of the system.

6.6 Interpretation of the mispricing dynamic

The filtered dynamic of the mispricing shows very interesting features, like periods of overor underpricing, some abrupt changes, etc. We have noted on figure 6.6 some importantevents that can be identified from the mispricing dynamic.

First, three broad periods appear clearly: the early 1980s recession, with a negativeconvention, the Great Moderation, with positive mispricing, and the global recession of thelate 2000s, with again a negative mispricing convention.

Furthermore, some brutal drops of price are associated with known financial events. Amongthem, the Black Monday crash of October 1987 is clearly shown. The bursting of theInternet bubble in 2002 also known as the Dotcom crash, or more slightly the influenceof 09/11, are also visible. The 2007-2009 financial crisis marks the end of an overpricingperiod, and is the start of a more pessimistic time of global recession lasting ever since.The associated change of regime is really clear on the graph, as well as the US housingbubble preceding the beginning of the crisis. The European sovereign debt crisis, and inparticular the case of the almost bankruptcy of Greece, is also visible at the end of thetime series.

6.6 Interpretation of the mispricing dynamic 59

1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011

−0.

4−

0.2

0.0

0.2

Y

1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011

−0.

50.

00.

5

X

1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011

1000

5000

Qf

DJIAfundamental price

Figure 6.5: Filtering of X and Y for the DJIA data in the first two panels, with 90%confidence intervals. In the bottom panel, DJIA (black) and the underlying estimatedfundamental price (red), as recovered from the filtered mispricing process.


1975 1979 1983 1987 1991 1995 1999 2003 2007 2011

−0.

4−

0.2

0.0

0.2

0.4

US bear market

EuroDebt

Black Monday

Great Moderation

09/11

DotcomCrash

Oil Crisis

Early 80s recession

Figure 6.6: Mispricing of the DJIA and some important financial events, such as variouscrashes.

Chapter 7

Conclusion

In the present work, SMC methods and in particular Particle Filter have been used forestimating the parameters and filtering the mispricing process of the model developed inYukalov et al. (2009). Its main goal was to find a way to identify market types from anobserved financial time series. Qualitative properties of the dynamic can be analysed, suchas the existence of conventions and cycles. At the same time, conclusions can be drawn onthe behaviour of agents, individually and collectively, as well as on general properties of amarket, such as the level of regulation and the uncertainty about the fundamental value ofa stock.

The problem has proven to be quite non-trivial and the results are not as bright as one couldhave expected. Nevertheless, by reducing the ambitions to a qualitative understanding ofthe market type the results are very interesting.

The simulation study has helped to show the strengths and the limitations of the appliedmethods. It has demonstrated that one cannot hope to estimate the parameters precisely.On the other hand, the market type is usually well estimated, in particular when lookingat the signs of α and A, which determine the global behaviour of the dynamic. Thefully adapted Particle Filter that we have developed in the present work has shown thebest performances in terms of quality of estimates and convergence properties, which is asignificant achievement of the present thesis.

The application of the model to the DJIA data was conducted at the end of this workwith some surprising success. The estimated dynamical system showed some interestingproperties. In particular, two conventions coexist in the neighbourhood of zero, givingrise to a dynamic close to equilibrium but allowing some systematic deviations from thefundamental value. Additionally, the filtered mispricing process helped to highlight someimportant financial events of the last 40 years, such as bubbles and crashes, and longerperiods like the recent recession.

A critical question that has not been fully addressed in this work is about testing the model.One attempt has been done by comparing the actual data to their predictive distributionbut more should be developed on this line.

A related question is what we learn from the fitting of this model. Do we learn somethingabout the generative process of stock prices or is it just an interesting and fancy way todescribe the data? Without a proper way to test the model, what we get is only a possibledescription, not much better than the Ptolemaic model for the solar system. It is notexactly clear, however, what should be done to get a stronger conclusion, because the truth

61

62 Conclusion

is obviously not observable. Indeed, the mispricing is only a concept, which can never beenobserved, and similarly for the market type. A possible way could be to systematically testsome predictions. Forecasting could be about trajectories of prices, but also about othermore qualitative properties such as probabilities to jump from one convention to another,etc.

In the present work, the fundamental price has been modelled as a GBM, but furtherrefinement could be done by using more complex models, especially models that accountfor heteroscedasticity, such as GARCH. Including correlation between the Wiener processesof the state dynamic could also be attempted.

Nevertheless, such refinements of the model would still increase the difficulty of tuning thehyper-parameters. Indeed, a criterion to estimate them properly is highly needed. Thisquestion is strongly connected with the problem of model checking mentioned above andshould be the object of further investigations.

Another line of research is the development of better algorithms, especially for the estimationof the parameters. As it has been suggested, an auxiliary Particle Filter looking more thanone step ahead is expected to improve the performances. Another idea is to design smarterartificial dynamic of the parameters that would explore the space in a more structured andconstrained way. This could be related as well to the use of truly time varying parameters,which has not been discussed so far.

Furthermore, it is sensible to think that one could use some external information to improvethe quality of the estimates. These could be related to other economic indices such asinterest rates or market regulations that are somehow connected to the parameters of themodel. It is nevertheless very unclear how this could be done, but would probably involvethe design of complex prior distributions.

Given the good results obtained on the DJIA, it would be very interesting to try to applythe model systematically to other financial indexes or individual stocks and to develop anoriginal framework for forecasting.

Bibliography

Bachelier, L. (1900). Theorie de la speculation. Gauthier-Villars.

Beskos, A., O. Papaspiliopoulos, G. Roberts, and P. Fearnhead (2006). Exact and compu-tationally efficient likelihood-based estimation for discretely observed diffusion processes(with discussion). Journal of the Royal Statistical Society: Series B (Statistical Method-ology) 68 (3), 333–382.

Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. Thejournal of political economy 81, 637–654.

Carpenter, J., P. Clifford, and P. Fearnhead (1999). Improved particle filter for nonlinearproblems. In Radar, Sonar and Navigation, IEE Proceedings-, Volume 146, pp. 2–7. IET.

Doucet, A., N. De Freitas, and N. Gordon (2001). Sequential Monte Carlo methods inpractice. Springer Verlag.

Doucet, A. and A. Johansen (2009). A tutorial on particle filtering and smoothing: Fifteenyears later. Handbook of Nonlinear Filtering 12, 656–704.

Durham, G. and A. Gallant (2002). Numerical techniques for maximum likelihood estimationof continuous-time diffusion processes. Journal of Business and Economic Statistics20 (3), 297–338.

Eraker, B. (2001). Mcmc analysis of diffusion models with application to finance. Journalof Business and Economic Statistics 19 (2), 177–191.

Gordon, N., D. Salmond, and A. Smith (1993). Novel approach to nonlinear/non-gaussianbayesian state estimation. In Radar and Signal Processing, IEE Proceedings F, Volume140, pp. 107–113. IET.

Kunsch, H.-R. (2001). State space and hidden markov models. In O. Barndorff-Nielsen,D. Cox, and C. Kluppelberg (Eds.), Complex stochastic systems, Volume 87, Chapter 3,pp. 109–173. CRC Press.

Liu, J. and M. West (2001). Combined parameter and state estimation in simulation-basedfiltering. See Doucet et al. (2001), Chapter 10, pp. 197–223.

Pitt, M. and N. Shephard (1999). Filtering via simulation: Auxiliary particle filters. Journalof the American Statistical Association 94, 590–599.

Samuelson, P. (1965). Proof that properly anticipated prices fluctuate randomly. Industrialmanagement review 6 (2), 41–49.

63

64 BIBLIOGRAPHY

Sornette, D. (2004). Why stock markets crash: critical events in complex financial systems.Princeton Univ Pr.

Werner, M., K. Ide, and D. Sornette (2011). Earthquake forecasting based on dataassimilation: sequential monte carlo methods for renewal point processes. NonlinearProcesses in Geophysics 18, 49–70.

West, M. (1993). Approximating posterior distributions by mixture. Journal of the RoyalStatistical Society. Series B (Methodological) 55, 409–422.

Whitley, D. (1994). A genetic algorithm tutorial. Statistics and computing 4 (2), 65–85.

Yukalov, V. (1990). Self-semilar approximations for strongly interacting systems. PhysicaA: Statistical Mechanics and its Applications 167 (3), 833–860.

Yukalov, V., D. Sornette, and E. Yukalova (2009). Nonlinear dynamical model of regimeswitching between conventions and business cycles. Journal of Economic Behavior &Organization 70 (1-2), 206–230.

Appendix A

Implementation

In this Appendix some code samples are presented. It starts with a minimal simulation ofthe model (section A.1) before showing the basic function of the Particle Filter (sectionA.2). Then the functions necessary for estimating the state with a normal SIR ParticleFilter are included (section A.3). At the end, the implementation of the fully adaptedParticle Filter is displayed and used to estimate the parameters within the augmentedstate approach (section A.4 and A.5).

A.1 Simulate the model

Listing A.1: Simulate data from the model

## typical simulation of data from start.t to end.t with frequency 1/delta.t## with params the vector parametersdelta.t <- 1/250start.t <- 0end.t <- 20## number of ticksN <- (end.t-start.t)/delta.t + 1## x and y processx <- y <- numeric(N)x[1] <- 1y[1] <- -1## Wiener process for x and yWx <- params["sigx"]*sqrt(delta.t)*rnorm(N)Wy <- params["sigy"]*sqrt(delta.t)*rnorm(N)

## SDE processfor (t in 2:N){x[t] <- x[t-1] + y[t-1]*delta.t + Wx[t]y[t] <- y[t-1] + f.nonlin(x[t-1],y[t-1], parameter=params)*delta.t + Wy[t]

}

## Price process:log.returns <- rnorm(N-1, mean = diff(state[,"x"]) + delta.t*params["muf"],

sd = params["sigf"]*sqrt(delta.t))

## create the final object:state <- data.frame(y=y,x=x,q=cumsum(c(q.0,log.returns)))

65

66 Implementation

state <- ts(state, start, end, deltat=delta.t)

## observations:y <- log.returns

Listing A.2: Drift term of Y

f.nonlin <- function(x, y, parameter){## Purpose: drift of y## given in equation 2.7##----------------------------------------------------------------------------## Arguments:## X and Y components of state## parameter: can be either a named vector when the parameters are known## or a data.frame of the n.rep particles if unknown##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012## Parameter known:if (is.null(nrow(parameter))){parameter["alph"]*x + parameter["bet"]*y +

parameter["A"]*x^3*exp(-x^2/parameter["mu"]^2) +parameter["B"]*y^3*exp(-y^2/parameter["lamb"]^2)

}## Parameter unknownelse {parameter[,"alph"]*x + parameter[,"bet"]*y +

parameter[,"A"]*x^3*exp(-x^2/parameter[,"mu"]^2) +parameter[,"B"]*y^3*exp(-y^2/parameter[,"lamb"]^2)

}}

A.2 Particle Filter

Listing A.3: General Particle Filter implementation

particle.filter <- function(x0, y, proposal, f.weight, delta.t, parameter){## Purpose: implement Particle Filter algorithm described in section 3.2.3

and alg. 2##----------------------------------------------------------------------------## Arguments:## x0: particles at time 0 (data.frame)## y: observations (ts object or vector)## proposal: sim proposal q(X_t|Y_t, X_t-1)## f.weight: weight function## delta.t: 1/frequency of observations## parameter: c(alph, bet, A, B, mu, lamb, sigx, sigy, sigf, muf)## Outputs: list of two elements## -$x: three dimensional array showing the evolution of the states of the

particles in time## -$lambda: matrix (time, particles) with the weigths of each particles

evolving in time##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012n.rep <- nrow(x0) #number of particlesN <- length(y) #number of observationsd <- ncol(x0) #number of hidden variables (dim(state))x <- x0 #initialization

A.3 Normal SIR Particle Filter 67

## 3d-array [time particles state]## store n.rep particles with d-dim state at each time stepx.output <- array(0, dim=c(N+1, n.rep, d),

dimnames=list(time=as.character(0:N),particles=paste("particle:",as.character(1:(n.rep))),states=names(x0)))

x.output[1,,] <- as.matrix(x)## weights matrixlambda <- matrix(1/n.rep,nrow=N+1,ncol=n.rep)## FILTERING:for (i in 1:N){## resampling step:index <- systematic.sample(n.rep,lambda[i,])## propagation step:x <- proposal(x[index,],delta.t,n.rep,parameter,y=y[i],i)## weighting of particles:w <- f.weight(x,y[i],delta.t,parameter)lambda[i+1,] <- w/sum(w)## packingx <- x$statex.output[i+1,,] <- as.matrix(x)

}return(list(x=x.output,lambda=lambda))}

Listing A.4: Balanced sampling

systematic.sample <- function(size=length(prob),prob){## Purpose: generating a balanced sample from (1,2,..,length(prob))## described in section 3.2.4 and algorithm 3##----------------------------------------------------------------------------## Arguments: size=sample size## prob=vector of probabilites##----------------------------------------------------------------------------## Author: Hans-Ruedi Kuensch, Date: 9 Jun 2005, 14:22w <- floor(size*cumsum(prob) + runif(1))w <- w-c(0,w[-length(prob)])rep(1:length(prob),w)}

A.3 Normal SIR Particle Filter

Listing A.5: Typical script for a normal SIR Particle Filter with known parameters

## typical call for a normal SIR Particle Filter with known parameters## assuming that y is the vector of observations and params the parametersn.rep <- 1000mean.state <- c(x=0, y=0)pf.0 <- initialize.particles(n.rep, mean.state, parameter=params,

type="unif", min=-3, max=3){pf.model <- particle.filter(x0=pf.0, y=y, f.state=diff.process,

f.observation=likelihood, parameter=params)## averaged filtering:x <- pf.model$xnobs <- length(y)x.mean <- data.frame(x[,1,])

68 Implementation

for (i in 1:nobs){x.mean[i,] <- apply(x[i,,], 2, weighted.mean, w=pf.model$lambda[i,])

}

Listing A.6: Diffusion process

### transition of the diffusion processdiff.process <- function(state, delta.t, n.rep,parameter,y,i,...){## Purpose: transition of the diffusion process for n.rep particles##----------------------------------------------------------------------------## reproductions of equations 2.13 and 2.14## Y_t+1 = Y_t + f^NL(X_t,Y_t)*Deltat + Wx## X_t+1 = X_t + Y_t*Deltat + Wy##----------------------------------------------------------------------------## Arguments:## state: particle state (named vector)## delta.t: 1/frequency of observations## parameter: c(sigx, sigy)## y and i are not used for the normal SIR PF#### Outputs: list of two elements## -$state: state of particle at time t+1## -$delta: variation from t to t+1 for X and Y##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012

## increments of Wiener processes:Wx <- parameter["sigx"]*sqrt(delta.t)*rnorm(n.rep)Wy <- parameter["sigy"]*sqrt(delta.t)*rnorm(n.rep)

## drift terms:drift.x <- state[,"y"]*delta.tdrift.y <- f.nonlin(state[,"x"],state[,"y"],parameter)*delta.t

## new state:state[,"x"] <- state[,"x"] + drift.x + Wxstate[,"y"] <- state[,"y"] + drift.y + Wy

## outputreturn(list(state=state,

delta=data.frame(delta.x=drift.x+Wx,delta.y=drift.y+Wy)))}

Listing A.7: Observation process

likelihood <- function(state, y, delta.t, parameter){## Purpose: measurement density of R_t | (X_t, X_t-1)##----------------------------------------------------------------------------## normal density centered at (X_t - X_t-1) + Deltat*muf## with standard deviation: sqrt(Deltat)*sigf##----------------------------------------------------------------------------## Arguments:## state: list of two elements:## -$state: state vector of the particle at t## -$delta: variation from t-1 to t for X and Y## y: observation at time t## delta.t: 1/frequency of observations## parameter: c(..., sigx, sigy)##

A.4 Fully adapted Particle Filter 69

## Outputs: likelihood of observation y | state (vector)##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012dnorm(y, mean = state$delta[,"delta.x"] + delta.t * parameter["muf"],

sd = parameter["sigf"] * sqrt(delta.t))}

Listing A.8: Initialization of particles

initialize.state <- function(n.rep, mean.state,type=c("normal","unif"),...){## Purpose: generate n.rep particles##----------------------------------------------------------------------------## Arguments:## n.rep: number of particles## mean.state: named vector with the mean of each variable## 2 types with additional arguments ...:## -normal: iid Normal(mean.state, var.state)## -unif: iid Unif around mean.state (min, max)## Outputs: data.frame with each row a particle##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012d <- length(mean.state)## noise componentif (type=="normal") noise <- replicate(d, rnorm(n.rep))*

matrix(...,nrow=n.rep,ncol=d,byrow=TRUE)if (type=="unif"){

noise <- runif(n.rep,...)for (i in 2:d) noise <- cbind(noise, runif(n.rep,...))

}## mean.state + noisestate.0 <- matrix(mean.state,nrow=n.rep, ncol=d,byrow=T) + noisestate.0 <- as.data.frame(state.0)names(state.0) <- names(mean.state)return(state.0)}

A.4 Fully adapted Particle Filter

Listing A.9: Adapted proposal

adapted.proposal <- function(state, delta.t, n.rep, parameter, y, i){## Purpose: proposal of the fully adapted Particle Filter described## in section 3.3.2## whose distribution is given by equation 3.25##----------------------------------------------------------------------------## q=p(Z_t+1 | R_t+1,Z_t)##----------------------------------------------------------------------------## Arguments:## state: particle state (named vector)## delta.t: 1/frequency of observations## parameter: c(sigx, sigy, sigf)## i is not used for the present algorithm#### Outputs: list of two elements## -$state: state of particle at time t+1## -$delta: variation from t to t+1 for X and Y##----------------------------------------------------------------------------

70 Implementation

## Author: Sylvain Robert, Date: Jul 2012

## euler stepdelta.x <- y.old*delta.t +

state[,"sigx"]^2/(state[,"sigx"]^2 + params["sigf"]^2) * (y -(state[,"y"]+ params["muf"])*delta.t) +

rnorm(n.rep, mean=0,sd=sqrt(state[,"sigx"]^2*params["sigf"]^2/(state[,"sigx"]^2 +params["sigf"]^2))*sqrt(delta.t))

delta.y <- f.nonlin(state[,"x"], state[,"y"], parameter=state)*delta.t +rnorm(n.rep, mean=0, sd=state[,"sigy"]*sqrt(delta.t))

## proposalstate[,"x"] <- state[,"x"] + delta.xstate[,"y"] <- state[,"y"] + delta.y

return(list(state=state, delta=data.frame(delta.x, delta.y)))}

Listing A.10: Adapted measurement (predictive distribution)

adapted.measure <- function(state, y, delta.t, parameter){## Purpose: measurement density of R_t+1 | Z_t## given in equation 3.24##----------------------------------------------------------------------------## Arguments:## state: list of two elements:## -$state: state vector of the particle at t## -$delta: variation from t-1 to t for X and Y## y: observation at time t## delta.t: 1/frequency of observations## parameter: c(..., sigx, sigy)#### Outputs: vector of predictive probabilities##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012

dnorm(y, mean=(state$state[,"y"]-state$delta[,"delta.y"] +parameter["muf"])*delta.t,

sd=sqrt(parameter["sigf"]^2 + state$state[,"sigx"]^2) *sqrt(delta.t))

}

A.5 Parameter estimation

Listing A.11: Typical script for a fully adapted Particle Filter with unknown parameters

## typical call for a fully adapted Particle Filter with unknown parameters## assuming that y is the vector of observations## and params the hyper-parameters (sigx, sigy, sigf, muf)n.rep <- 1000mean.state <- c(x=0, y=0)pf.0 <- initialize.state(n.rep, mean.state, type="unif", min=-3, max=3)pf.model <- particle.filter(x0=pf.0, y=y, f.state=adapted.proposal,

f.observation=adapted.measure, parameter=params)

## and using the light version of the particle filter:n.rep <- 10000mean.state <- c(x=0, y=0)

A.5 Parameter estimation 71

pf.0 <- initialize.particles(n.rep, mean.state, type="unif", min=-3, max=3)pf.light <- particle.filter.light(x0=pf.0, y=y, f.state=adapted.proposal,

f.observation=adapted.measure, parameter=params)

## estimate of the parameter:estimates <- apply(pf.light$x[,], 2, weighted.mean, w=pf.model$lambda)

Listing A.12: Initialization of particles in the augmented scheme

initialize.particles <- function(n.rep, mean.state, parameter=params,...){## Purpose: generate n.rep particles##----------------------------------------------------------------------------## Arguments:## n.rep: number of particles## mean.state: named vector with the mean of each variable (Y and X)## parameter: named vector of hyper-parameters## ...: arguments to pass to initialize.state()## Outputs: data.frame with each row a particle##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012d <- length(mean.state)##----------------------------------------------------------------------------## Initialize X and Y:state <- initialize.state(n.rep, mean.state, ...)##----------------------------------------------------------------------------## Initialize parameters## A and alph:## sample case: alph and A<0 , alph<0 and A>0 or alph>0 and A<0signs <- sample(1:3, n.rep, replace=TRUE)state[,"alph"] <- ifelse(signs==1,runif(n.rep,

-5,0),ifelse(signs==2,runif(n.rep, -5,0), runif(n.rep, 0,5)))state[,"A"] <- state[,"A"] <- ifelse(signs==1, runif(n.rep, -12,0),

ifelse(signs==2, runif(n.rep, 0,12),runif(n.rep, -12,0)))## bet, B, mu, lambstate[,"bet"] <- runif(n.rep, -5,0) # never positivestate[,"B"] <- runif(n.rep, 0, 5) # should be positivestate[,"mu"] <- runif(n.rep, 0.1 ,6)state[,"lamb"] <- runif(n.rep, 0.1 ,6)## hyper-parametersstate[,"sigx"] <- parameter["sigx"]state[,"sigy"] <- parameter["sigy"]state[,"sigf"] <- parameter["sigy"]state[,"muf"] <- parameter["muf"]return(state)}

Listing A.13: Light version of the Particle Filter

particle.filter.light <- function(x0, y, f.state, f.observation,parameter=params, verbose=T){

## Purpose: equivalent to the general Particle Filter algorithm A.2## but with less storage requirement (store only the last step)## useful for estimation of parameters with a large number of particles##----------------------------------------------------------------------------## Author: Sylvain Robert, Date: Jul 2012n.rep <- nrow(x0)

N <- length(y)d <- ncol(x0)

72 Implementation

x <- x0

lambda <- rep(1/n.rep,n.rep)

for (i in 1:N){if (verbose) if (i%%round((N/10))==0) cat("time = ",i, "\n")## resamplingindex <- bal.sample(n.rep,lambda)#index <- sample.int(n.rep, prob=lambda[i,], replace=TRUE)## propagationx <- f.state(x[index,],y=y[i], parameter=parameter, n.rep=n.rep, i=i)## reweightingw <- f.observation(x,y[i], parameter=parameter)lambda <- w/sum(w)## packingx <- x$state

}return(list(x=x,lambda=lambda))

}

sequential monte carlo methods for a dynamical model of ... · sequential monte carlo (smc)...

Documents