extreme value theory in time series analysis -...

6/24/2008

1

Strasbourg 6/081

Extreme Value Theory in Time Series Analysis

Richard A. Davis

Columbia University

http://www.stat.columbia.edu/~rdavis

Strasbourg 6/082

Game Plan

Quick overview of classical extreme value theory

Extremal types theorem

Generalized extreme value distributions

Examples

Domain of attraction

Generalized Pareto distribution

Point Processes

Basic results

Application to extremes

Extensions to stationary time series

Extremal index

Mixing conditions

Point process examples

Regular variation

Univariate

Multivariate

Estimation

Examples

Application to Crystal River Flows

Bonus example!!

6/24/2008

2

Strasbourg 6/083

References

Bingham, N.H., Goldie, C.M. and Teugels, J.L. (1987) Regular Variation. Cambridge

University Press, Cambridge

Embrechts, P., Klüppelberg, C. and Mikosch, T. (1997) Modelling Extremal Events for

Insurance and Finance. Springer, Berlin.

Galambos, J. (1978) The Asymptotic Theory of Order Statistics. Wiley, New York.

Haan, L. de, Fereira, A. (2006) Extreme Value Theory: An Introduction. Springer,

Berlin, New York.

Leadbetter, M.R., Lindgren, G. and Rootzén, H. (1983) Extremes and Related

Properties of Random Sequences and Processes. Springer, Berlin.

Resnick, S.I. (1986) Point processes, regular variation and weak convergence. Adv.

Appl. Prob. 18, 66–138.

Resnick, S.I. (1987) Extreme Values, Regular Variation, and Point Processes.

Springer, New York.

Resnick, S.I. (20067 Heavy Tail Phenomena: Probabilistic and Statistical Modeling.

Springer, New York.

Samorodnitsky, G. and Taqqu, M.S. (1994) Stable Non–Gaussian Random

Processes. Stochastic Models with Infinite Variance. Chapman and Hall, London.

Strasbourg 6/084

Classical Extreme Value Theory

Setup:

• {Xt} ~IID(F)

• Mn= max{X1,…, Xn}

• P(Mn x) = Fn(x)

Now in order for the right hand side to converge to a nonzero value,

must let x with n. Replacing x by un ,

P(Mn un) = Fn(un)

= (1-(1-F(un)))n

= (1-n(1-F(un))/n)n e-t

if and only if

n(1-F(un)) t

6/24/2008

3

Strasbourg 6/085

Classical EVT— Extremal Types Theorem

Convergence of types: Now taking un = anx+ bn, an > 0,

P (an-1(Mn – bn ) x) = Fn(anx+ bn)

G(x)

if and only if

n(1-F(anx+ bn)) -log G(x)

Theorem. If G is a nondegenerate distribution, then G has to be one

of the three types,

1. G(x) = exp(-e-x) (Gumbel)

2. G(x) = exp(-x-a), x 0 (Fréchet)

3. G(x) = exp(-(-x)a), x 0 (Weibull)

Strasbourg 6/086

Classical EVT— GEV

The three types of extreme value distributions can be

parameterized as one family (GEV family) via,

where

x = 0 implies G(x) = exp(-e-x) (Gumbel)

x > 0 implies G(x) is Fréchet (a = 1/x)

x < 0 implies G(x) is Weibull ( x < -1/x, a = -1/x)

Summary

• Gumbel—light tailed

• Fréchet—heavy tailed

• Weibull—finite endpoint

01 ),)1(exp()( /1 >xx-= x- xxxG

6/24/2008

4

Strasbourg 6/087

Classical EVT— Examples

In looking at maxima, usually one considers only the Gumbel and

Fréchet distributions for modeling.

Examples

1. uniform. F(x)=x, 0 < x < 1. With an=1/n, bn=1, we have for n

large

n(1-F(anx+ bn)) = -x = -log G(x) = -log( exp(-(-x)1)

P(n(Mn – 1 ) x) ex (x 0). Weibull limit

2. exponential. F(x)=1-e-x, x > 0. With an=1, bn=log n, we have

n(1-F(anx+ bn)) = n e-x-log n = e-x = -log(exp(-e-x))

P( Mn – log n x) exp(-e-x) (Gumbel limit)

Application to Fisher’s test for hidden periodicity in time series (see

B&D (1991) and Davis and Mikosch (1999)).

Strasbourg 6/088


Exponential. P( Mn – log n x) exp(-e-x)

Empirical distribution of Mn – log n, n=100 Limit density

x

de

nsit

y

0 5 10

0.0

0.1

0.2

0.3

6/24/2008

5

Strasbourg 6/089


3. Pareto. F(x)=1-x-a , 1 < x. With an=na, bn=0, we have for n large

n(1-F(anx)) = x-a = -log G(x) = -log( exp(-x-a) )

P(n-aMn x) exp(-x-a ) (x 0). Fréchet limit

x

de

nsity

0 5 10

0.0

0.1

0.2

0.3

blue = empirical

red = limit

Strasbourg 6/0810


4. Gaussian.

an= (2 log(n))-1/2, bn=(2 log(n))1/2 - .5 (2 log(n))-1/2(log log(n)+log(4pi)),

n(1-F(anx+ bn)) e-x = -log(exp(-e-x))

P((2 log(n))1/2 (Mn – bn ) x) exp(-e-x) (Gumbel limit)

x

de

nsit

y

-4 -2 0 2 4 6 8

0.0

0.1

0.2

0.3

x

de

nsit

y

-2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

n=100

6/24/2008

6

Strasbourg 6/0812


x

de

nsit

y

-4 -2 0 2 4 6 8 10

0.0

0.1

0.2

0.3

x

cd

f

-2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

x

de

nsit

y

-4 -2 0 2 4 6 8

0.0

0.1

0.2

0.3

x

cd

f

-2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

n =1000

n =10000

Strasbourg 6/0813


x

de

nsit

y

-4 -2 0 2 4 6 8 10

0.0

0.1

0.2

0.3

x

cd

f

-2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

n =1000

n =100000

x

de

nsit

y

-4 -2 0 2 4 6 8

0.0

0.1

0.2

0.3

x

cd

f

-2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

6/24/2008

7

Strasbourg 6/0814

Classical EVT— Domain of Attraction

If

P((Mn – bn)/ an x) = Fn(anx+ bn) G(x)

which is equivalent to

n(1-F(anx+ bn)) -log G(x) = (1xx)-1/x,

then we say that F (or X) is in the domain of attraction of G (write

F or X ϵ D(G)).

Note that if x=0, then

n(1-F( bn)) 1,

or

P(X1 > bn ) ~ 1/n.

It follows that

P(X1 - bn > an x | X1 > bn ) (1xx)-1/x

Strasbourg 6/0815

Classical EVT— GPD

P(X1 - bn > an x | X1 > bn ) (1xx)-1/x

Replacing bn with u, it follows that conditional distribution of X1 - u

given X1 > u has an approximate generalized Pareto distribution,

P(X1 - u ≤ an x | X1 > u ) ~ H(x)=1- (1xx)-1/x

for u large.

Generalized Pareto Distribution (GPD)

H(x)=1- (1x(x-m)/s)-1/x

The limit above is the basis for fitting the tail of the distribution of a

random variable using a GPD. There are myriad of procedures for

estimating the parameters in this model.

6/24/2008

8

Strasbourg 6/0816

Classical EVT— Domains of Attraction

Domains of attraction: There are necessary and sufficient

conditions for F ϵ D(G) for the three extreme value distributions.

The heavy-tailed Fréchet, which is perhaps the most commonly

used extreme value distribution, has the easiest n.a.s. to state (and

check!). In this case,

F ϵ D(exp(-x-a)) if and only if F is RV(a) for some a > 0.

Regular variation: F is RV(a) if and only if

for every x > 0.

, as )(

)(

)(

)(

>

>= a- tx

tXP

txXP

tF

txF

Strasbourg 6/0817

Classical EVT — Domains of Attraction

Examples:

Pareto,

log-gamma,

infinite variance stable,

Cauchy,

student,

Fréchet.

Remarks:

• P(X > x) = L(x)x-a where L is slowly varying, L(tx)/L(t)1

(think log function for L)

• Can take bn = 0 and an as the 1-1/n quantile of F, i.e.,

n(1-F(an)) 1. It turns out that an = L1(n)n1/a

• P(an-1 Mn x) exp(-x-a).

6/24/2008

9

Strasbourg 6/0818

Point Processes

The theory of point processes plays a central role in extreme value

theory. Applications include:

• Derivation of joint limiting distribution of order statistics, i.e.,

kth largest order statistic, limiting distribution of maximum

and minimum, etc.

• Calculation of limit distribution of exceedances of a high

level.

• Extensions to stationary processes.

• Provides a useful tool in heavy-tailed case for deriving

limiting behavior of various statistics, e.g., sample mean,

sample autocovariances, etc, which are often determined by

the behavior of the extreme order statistics.

Strasbourg 6/0819

Point Processes — Definition and Basic Results.

Setup:

• Suppose (Xt) is an iid sequence with common distribution F.

• F ϵ D(exp(-x-a)) (Can be formulated for general extreme value

distribution, but for simplicity we will restrict attention to Fréchet.)

We have

n(1-F(anx)) = nP(an-1X >x)) x-a

from which it follows that

nP(an-1 X ϵ (a,b]) a-a - b-a =: n(a,b],

where n is the measure on (0,] given by n(dx)= ax-a-1(dx).

More generally, we have

nP(an-1 X ϵ B) n(B)

for all nice sets B, those bounded away from 0.

6/24/2008

10

Strasbourg 6/0820


Since

nP(an-1 X ϵ B) n(B)

for nice sets B, it follows that

nP(an-1 X ϵ ) v n(),

where v denotes vague convergence of measures.

Now for a set B bounded away from 0, define the sequence of point

processes by

Nn(B) = #{t=1,…,n: an-1 Xt ϵ B} =

where ex is the Dirac measure at x.

=

-en

tXa

Btn

1

)(1

Strasbourg 6/08

=

-en

tXa

Btn

1

)(1

21


Nn(B) = #{t=1,…,n: an-1 Xt ϵ B} =

Properties:

• Nn(B) ~ Bin(n,pn ), where pn= P(an-1 X ϵ B) and since

nP(an-1 X ϵ B) n(B),

it follows from convergence of binomial to Poisson that

Nn(B) d N(B),

where N(B) is a Poisson random variable with mean n(B).

• In fact, we have the stronger point process convergence,

Nn d N

where N is a Poisson process on (0,] with mean measure n (dx)

and d denotes convergence in distribution of point processes.

6/24/2008

11

Strasbourg 6/0822


Convergence for point processes.

For our purposes, d for point processes means that for any

collection of Borel sets B1, . . . ,Bk that are bounded away from 0

and P(N(∂Bj) > 0) = 0, j = 1, . . . , k, we have

(Nn(B1), . . . ,Nn(Bk)) d (N(B1), . . . ,N(Bk))

Strasbourg 6/0823

Point Processes — Application to Extremes

Application

Define Mn,2 to be the second largest among X1, . . . ,Xn. Since the

event

{an-1 Mn,2 ≤ y} is the same as {Nn (y,∞) ≤ 1},

we conclude from the point process convergence that

P(an-1 Mn,2 ≤ y) = P(Nn(y,∞) ≤ 1)

→ P(N(y,∞)) ≤ 1)

= exp{-y-a }(1 − log (exp{-y-a })).

6/24/2008

12

Strasbourg 6/0824

Point Processes — Application to Extremes

Application

Similarly, the joint limiting distribution of (Mn,Mn,2) can be calculated

by noting that for y ≤ x,

{an-1 Mn ≤ x, an

-1 Mn,2 ≤ y} is the same as {Nn (x,∞) ≤ 0, Nn (y,x] ≤ 1}.

Hence,

P(an-1 Mn ≤ x, an

-1 Mn,2 ≤ y) = P(Nn (x,∞) ≤ 0, Nn(y,x] ≤ 1)

→ P(N(x,∞) ≤ 0, N(y,x] ≤ 1)

= exp{-y-a }(1 + y-a - x-a ).

Strasbourg 6/0825

Point Processes — Representation of Limit.

Representation of limit Poisson process

The points of the limit Poisson process can be displayed in an

explicit fashion. Set

Gk= E1+ · · · + Ek,

where E1,E2, . . . are iid unit exponentials. Then the limit Poisson

process N has the representation

and

If we order the data, then we can read off the weak convergence

for the kth-largest Mn,k, i.e.,

an-1 Mn,k d Gk

-1/a (joint in k)

=

G=

a-- e=e=n

t

n

t

dXanktn

NN11

/11

=G a-e=

1

/1

kk

N

6/24/2008

13

Strasbourg 6/0826

Point Processes — Exceedances

Point Process of Exceedances.

Sometimes, it is convenient to consider the two-dimensional point

process defined by

For x > 0 fixed, the point process

Is the point process of exceedances of the level anx. This point

process converges in distribution to a homogeneous Poisson process

rate with intensity x-a.

Bottom line: point process of exceedances is approximately Poisson.

.1

),/( 1

=

-e=t

Xantntn

N

)),(()),((1

),/( 1 e=

=

- xxNt

Xantntn

Strasbourg 6/0827

Point Processes — An Application

Cool application

If a < 1, then

implies, by an application of the continuous mapping theorem, that

The limit random variable is positive stable with index a.

.1

/1

1

1

=

a-

=

- Gk

k

n

t

dtn Xa

=

G=

a-- e=e=n

t

n

t

dXanktn

NN11

/11

6/24/2008

14

Strasbourg 6/0828

Extension to Stationary Time Series

Let (Xt) is a strictly stationary sequence with common df F ∈ D(G), i.e.,

Fn(anx+ bn) G(x).

Theorem If (Xt) satisfies a mixing condition (like strong mixing) and

P( an-1(Mn – bn ) x) H(x),

H nondegenerate, then there exists a q ∈ (0,1] such that

H(x)=Gq (x).

The parameter θ is called the extremal index and is a measure of

extremal clustering.

Strasbourg 6/0829

Extension to Stationary Time Series—Extremal Index

Fn(anx+ bn) G(x) P( an-1(Mn – bn ) x) Gq (x).

Properties

• θ < 1 implies clustering of exceedances

• Suppose c is a threshold such that Fn(c) ~.95 and θ = .5. Then

P(Mn ≤ c) ∼ .951/2 = .975

• 1/θ is the mean cluster size of exceedances.

• In a certain sense, one can view θ as a measure of statistical

efficiency relative to the iid case. That is, one needs 1/θ more

observations to match the behavior of the iid case. Specifically,

P(Mn/q x) ~ Fn(x)

6/24/2008

15

Strasbourg 6/0830

Extension to Stationary Time Series—Example

Example (max-moving average) Let (Zt) be iid with a Pareto

distribution, i.e., P(Z1 > x) = x-a for x 1, and set

Xt = max(Zt, fZt-1), f ∈ [0,1].

Then

nP(X1 > xn1/a ) (1+fa)x-a and Fn(anx) exp(-(1+fa)x-a ).

On the other hand

P( n-1/a Mn x) = P( n-1/a max(Z0 ,…, Zn) x) exp(-x-a ).

Thus θ = 1/(1+fa).

Strasbourg 6/0831


iid (pareto a = 3) max-moving average (f = 1)

q = 1 q = 1/2

t

ma

x m

a

0 20 40 60 80 100

12

34

56

Note that cluster size is exactly 2 in this case.

t

iid

0 20 40 60 80 100

12

34

56

6/24/2008

16

Strasbourg 6/0832

Extension to Stationary Time Series—Mixing Conditions

Strong Mixing:

Remarks:

• Since mixing is defined via σ-fields, measurable functions of (Xt)

inherit the same mixing property. For example, if the stationary

sequence (Xt) is strongly mixing, so are (|Xt|) and (Xt2) with a rate

function of similar order.

• If (ak) decays to zero at an exponential rate, (Xt) is strongly

mixing with geometric rate, i.e., the memory between past and

future dies out exponentially fast.

• Strong mixing is much stronger than Leadbetter’s dependence

condition D(un).

. as 0|)()()(|sup)( )0(

a=-ss

kBPAPBAP kk,sXB,,sXA ss

Strasbourg 6/0833

Extension to Stationary Time Series—Mixing Conditions

• If (ak) decays to zero at an exponential rate, (Xt) is strongly

mixing with geometric rate, i.e., the memory between past and

future dies out exponentially fast.

• The rate function is closely related to the ACVF of the process:

if E|X1|2+δ < ∞ for some d > 0, then

|Cov(X0,Xh)|c akd/(2+d)

• Many commonly used time series models, such as ARMA, GARCH,

stochastic volatility processes, and Markov processes are strong

mixing with a geometric rate.

6/24/2008

17

Strasbourg 6/0834

Extension to Stationary Time Series—D’

Anti-clustering condition D’(un): Think of un as anx + bn .

as k .

Theorem: If (Xt) satisfies D and D’, FϵD(G), then q = 1 (i.e., no

clustering).

Remarks:

• If (Xt) is iid, then the lim sup of the sum is

limsupn n2/k P2(X1 > un) =O(1/k).

• If (Xt) is a stationary Gaussian process with ACF r(h)=o(1/log h), then

D and D’ hold and there is no clustering for Gaussian processes.

=

=>>kn

t

ntnn kOuXuXPn/

2

1 )/1(),(suplim

Strasbourg 6/0835


IID N(0,1/(1-.92)) AR(1): Xt = .9 Xt-1 + Zt, (Zt)~IID N(0,1)

• Even though q = 1, there appears to be some clustering for small n.

• Hsing, Hüsler, Reiss (1996) overcome this problem for Gaussian

processes by considering a triangular array or rvs.

t

AR

(1)

0 50 100 150 200

-4-2

02

4

t

iid

0 50 100 150 200

-6-4

-20

24

6

6/24/2008

18

Strasbourg 6/08

( ) )/1()1()1/(

)/1(

)/1()(lim

)/1()),max(,),(max(lim

)/1(),(lim

),(suplim

1

1201

21

/

2

1

kOx

kOx

kOuZnP

kOuZZuZZnP

kOuXuXnP

uXuXPn

nn

nnn

nnn

kn

t

ntnn

fff=

f=

>f=

>f>f=

>>=

>>

a-aaa

a-a

=

36


Max-moving average: Let (Zt) be iid with a Pareto distribution, i.e.,

P(Z1 > x) = x-a for x 1, and set

Xt = max(Zt, fZt-1), f ∈ [0,1]

Then, since (Xt) is 1-dependent, the mixing condition D is automatically

satisfied. As for D’,

Strasbourg 6/08

, , 1

11

/11 kk

n

t

n

t

dZaEE

ktn

=Gee =

G=

a--

)(),0(

1)0,(

1),(

*/1/1

11 a-a-

-- G

=G

=

eee= kkttn

k

d

n

tZZanN

37

Point Process Example—baby steps

In particular, for one-dependent sequences,

P(X2 > x| X1 > x) 1-q = fa /(1+ fa ).

Point process convergence (max-moving average): With an=n1/a

nP(Z1 > anx) x-a and nP(X1 > anx) (1+fa)x-a

Define the sequence of point processes by

From the convergence

one can show

=

--e=

n

tZZan

ttn

N1

),(

*

11

6/24/2008

19

Strasbourg 6/08

38

Point Process Example—baby steps

Applying the continuous mapping theorem (need to be careful),

we have

N

N

kk

kk

ttntn

k

k

n

tZZa

n

tXan

:)(

)(

11

11

1,11

1

),0max(1

)0,max(d

1)max(

1

=ee=

ee

e=e=

a-a-

a-a-

---

Gf

=G

Gf

=G

=f

=

)(),0(

1)0,(

1),(

*/1/1

11 a-a-

-- G

=G

=

eee= kkttn

k

d

n

tZZanN

Red = Gk-1/a, k=1,…,5

Blue = .75 *Gk-1/a, k=1,…,5

0

39

Def: The random variable X is regularly varying with index a if

P(|X|> t x)/P(|X|>t) x-a and P(X> t)/P(|X|>t) p,

or, equivalently, if

P(X> t x)/P(|X|>t) px-a and P(X< -t x)/P(|X|>t) qx-a ,

where 0 p 1 and p+q=1.

Equivalence:

X is RV(-a) if and only if P(X t ) /P(|X|>t)v m( )

(v vague convergence of measures on R\{0}). In this case,

m(dx) = (pa x-a-1 I(x>0) + qa (-x)-a-1 I(x<0)) dx

Note: m(tA) = t-a m(A) for every t and A bounded away from 0.

Regular Variation — univariate case

6/24/2008

20

Strasbourg 6/0840

Another formulation (polar coordinates):

Define the 1 valued rv q, P(q = 1) = p, P(q = -1) = 1- p = q.

Then

X is RV(a) if and only if

or

(v vague convergence of measures on S0= {-1,1}).

)(x)t |X(|

)|X|X/ t x, |X(|SP

P

SP

>

> a-θ

)(x)t |X(|

)|X|X/ t x, |X(|

>

> a-θP

P

Pv

Regular Variation — univariate case

Strasbourg 6/0841

Equivalence:

m is a measure on Rm which satisfies for x > 0 and A bounded away

from 0,

m(xB) = x-a m(xA).

Multivariate regular variation of X=(X1, . . . , Xm): There exists a

random vector q Sm-1 such that

P(|X|> t x, X/|X| )/P(|X|>t) v x-a P( q )

(v vague convergence on Sm-1, unit sphere in Rm) .

• P( q ) is called the spectral measure

• a is the index of X.

)()t |(|

)t (m

>

v

P

P

X

X)(

)t |(|

)t (m

>

v

P

P

X

X

Regular Variation — multivariate case

6/24/2008

21

Strasbourg 6/0842

Examples: 1. If X1 and X2 are iid RV(a), then X= (X1, X2 ) is

multivariate regularly varying with index a and spectral distribution

(assuming symmetry)

P( q =pk/2) = ¼ k=1,2,3,4 (mass on axes).

Interpretation: Unlikely that X1 and X2 are very large at the same

time.

Figure: plot of

(Xt1,Xt2) for realization

of 10,000.

Regular Variation — multivariate case

x_1

x_

2

-20 -10 0 10 20

-20

02

04

0

Independent Components

43

2. If X1 = X2 > 0, then X= (X1, X2 ) is multivariate regularly varying

with index a and spectral distribution

P( q = p/2) ) = 1.

3. AR(1): Xt= .9 Xt-1 + Zt , {Zt}~IID t(3)

P( q = arctan(.9)) = .9898 P( q = p/2) ) = .0102

t

0 2000 4000 6000 8000 10000

-20

02

04

06

08

0

6/24/2008

22

44

Figure: plot of (Xt, Xt+1) for realization of 10,000.

Xt= .9 Xt-1 + Zt

x=t

x=

{t+

1}

-20 0 20 40 60 80

-20

02

04

06

08

0

AR(1), X_{t+1} vs. X_t

Strasbourg 6/0846

The marginal distribution F for heavy-tailed data is often modeled using

Pareto-like tails,

1-F(x) = x-aL(x),

for x large, where L(x) is a slowly varying function (L(xt)/ L(x)1, as x

1). Now if

X~ F, then P(log X > x) = P(X > exp(x))~exp(-ax),

and hence log X has an approximate exponential distribution for large x.

The spacings,

log(X(n-j)) - log(X(n-j-1)), j=0,1,2,. . . ,m,

from a sample of size n from an exponential distr are approximately

independent and Exp(a(j+1)) distributed. This suggests estimating a-1

by

( )

( )

-

=

--

-

=

---

-

-=

-=a

1

0

)()(

1

0

)1()(

1

loglog1

)1(loglog1

ˆ

m

j

mnjn

m

j

jnjn

XXm

jXXm

Estimation of a and q

6/24/2008

23

Strasbourg 6/0847

Def: The Hill estimate of a for heavy-tailed data with distribution given

by

1-F(x) = x-aL(x),

is

( )

( )

-

=

--

-

=

---

-

-=

-=a

1

0

)()(

1

0

)1()(

1

loglog1

)1(loglog1

ˆ

m

j

mnjn

m

j

jnjn

XXm

jXXm

The asymptotic variance of this estimate for a is

and estimated by

(See also GPD=generalized Pareto distribution.)

m/2a ./ˆ 2 ma

Hill’s estimate of a

Strasbourg 6/0848

m

Hill

0 500 1000 1500 2000

12

34

5

Hill Plot for Independent Components

For a bivariate series, we will estimate a for the univariate series using

the Euclidean norm of the two components.


6/24/2008

24

Strasbourg 6/0849

m

Hill

0 500 1000 1500 2000

12

34

5

Hill Plot for AR(1)


Strasbourg 6/0850

Estimation of the spectral distribution of q

Based on the relation

P(|X|> t x, X/|X| )/P(|X|>t) v x-a P( q )

a naïve estimate of the distribution of q is based on the angular

components Xt/|Xt| in the sample. One simply uses the empirical

distribution of these angular pieces for which the modulus |Xt|

exceeds some large threshold. In the examples given below, we

use a kernel density estimate of these angular components for

those observations whose moduli exceed some large threshold.

Here we only consider two components, i.e., q is one dimensional.

6/24/2008

25

Strasbourg 6/0851

theta

-3 -2 -1 0 1 2 3

0.1

00

.15

0.2

0


Estimation of the spectral distribution of q

x_1

x_

2

-20 -10 0 10 20

-20

02

04

0


Strasbourg 6/0852

theta

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

AR(1)

Estimation of q

x=t

x=

{t+1

}

-20 0 20 40 60 80

-20

02

04

06

08

0

AR(1), X_{t+1} vs. X_t

Vertical lines on right are at arctan(.9) and arctan(.9) -p

6/24/2008

26

Strasbourg 6/0853

ARCH(1): Xt=(a0a1 X2t-1)

1/2Zt, {Zt}~IID.

a found by solving E|a1 Z2|a/2 = 1.

a1 .312 .577 1.00 1.57

a 8.00 4.00 2.00 1.00

Distr of q:

P(q ) = E{||(B,Z)||a I(arg((B,Z)) )}/ E||(B,Z)||a

where

P(B = 1) = P(B = -1) =.5

Examples of Processes that are Regular Varying

Strasbourg 6/0854

Figures: plots of (Xt, Xt+1) and estimated distribution of q for

realization of 10,000.

Example of ARCH(1): a0=1, a1=1, a=2, Xt=(a0a1 X2t-1)

1/2Zt, {Zt}~IID

-20 -10 0 10 20

x_t

-20

-10

010

20

x_{t

+1}

-3 -2 -1 0 1 2 3

theta

0.0

60.0

80.1

00.1

20.1

40.1

60.1

8


6/24/2008

27

Strasbourg 6/0855

Example: SV model Xt = st Zt

Suppose Zt ~ RV(a) and

Then Zn=(Z1,…,Zn)’ is regulary varying with index a and so is

Xn= (X1,…,Xn)’ = diag(s1,…, sn) Zn

with spectral distribution concentrated on (1,0), (0, 1).

).N(0, IID~}{ , ,log 222 se<e=s

-=

-

-=

t

j

jjt

j

jt

-5000 0 5000 10000

x_1

-5000

05000

10000

x_2

Figure: plot of

(Xt,Xt+1) for

realization of 10,000.


Strasbourg 6/0856

Theorem (Davis & Hsing `95, Davis & Mikosch `97). Let {Xt} be a

stationary sequence of random m-vectors. Suppose

(i) finite dimensional distributions are jointly regularly varying (let

(q-k, . . . , qk) be the vector in S(2k+1)m-1 in the definition).

(ii) mixing condition A (an) or strong mixing.

(iii)

Then

(extremal index)

exists. If q > 0, then

.0) || ||(suplimlim 0t||

=>>

yayaP nnrtknk n

XX

a

=

a

qq-q=q || /) |||(|lim )(

0

)(

1

)(

0

kk

jj

kk

kEE

,::111

/ ijt

=

==

e=e=j

P

i

dn

t

an inNN QX

Point process Convergence

6/24/2008

28

Strasbourg 6/0857

Point process convergence(cont)

• (Pi) are points of a Poisson process on (0,) with intensity function

n(dy)=qay-a-1dy.

• , i 1, are iid point process with distribution Q, and Q is the

weak limit of

=

e1

ij

j

Q

=

a

q=

a

q-qeq-q

kt

k

jj

kkk

jj

kk

kEIE k

t||

)(

1

)(

0

)(

1

)(

0 ) |||(|/)( ) |||(|lim )(

Remarks:

1. GARCH and SV processes satisfy the conditions of the theorem.

2. Limit distribution for sample extremes and sample ACF follows from

this theorem.

Strasbourg 6/0858

Setup

Xt = st Zt , {Zt} ~ IID (0,1)

Xt is RV (a)

Choose {an} s.t. nP(Xt > an) 1

Then }.exp{)( 1

1 a-- - xxXaP n

n

Then, with Mn= max{X1, . . . , Xn},

(i) GARCH:

q is extremal index ( 0 < q < 1).

(ii) SV model:

extremal index q = 1 no clustering.

},exp{)( 1 a-- q- xxMaP nn

},exp{)( 1 a-- - xxMaP nn

Application to GARCH and SV Models

6/24/2008

29

Strasbourg 6/0859

Suppose (Xt) is the linear process

If an is chosen such that nP(|Zt| > an) 1, then

)RV( IID~)( , a=

-=

- t

j

jtjt ZZX

and

That is, each Poisson point Gk-1/a gives rise to a cluster of points

(deterministic) given by the coefficients of the filter, i.e., jGk-1/a .

Extremal index and other extremal properties can be determined

from this limit result.

Application to Linear Processes

a-

-=

a => xxaXnPj

jnt ||)|(|

,::11

/t

-=G

==

a-e=e=ji

dn

t

anjkn

NN X

Strasbourg 6/0860

River flow rate for Crystal River located in the mountain of Western

Colorado (see Cooley et al. (2007)). After deasonalizing the data,

we obtain 728 weekly observations from Oct 1, 1990 to Oct 1,

2005.

Application to Crystal River

t

cry

sta

l.ri

ve

r

0 200 400 600

-20

24

68

lag (h)

ac

f

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

6/24/2008

30

Strasbourg 6/0861

Estimates of a and the distribution of q for bivariate pairs (Xt-1,Xt)


m

Hill

0 20 40 60 80 100 120 140

24

68

X_{t-1}

X_

t

-2 0 2 4 6 8

-20

24

68

theta

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

Vertical lines at

p/4 and p/4 - p

Strasbourg 6/0862

The extremogram of a stationary time series (Xt) can be viewed as

the analogue of the correlogram for measuring dependence in

extremes (see Davis and Mikosch (2008)).

Definition: For two sets A & B bounded away from 0, the

extremogram is defined as

rA,B(h) = limnP(an-1X0 ϵ A, an

-1Xh ϵ B)/ P(an-1X0 ϵ A)

In many examples, this can be computed explicitly. If one takes

A=B=(1,), then

rA,B(h) = limx P(Xh >x, | X0 >x) = l(X0,Xh)

often called the extremal dependence coefficient.

The Extremogram

6/24/2008

31

Strasbourg 6/0863

The extremogram of a stationary time series (Xt) can be viewed as

the analogue of the correlogram for measuring dependence in

extremes (see Davis and Mikosch (2008)).

Definition: For two sets A & B bounded away from 0, the

extremogram is defined as

rA,B(h) = limnP(an-1X0 ϵ A, an

-1Xh ϵ B)/ P(an-1X0 ϵ A)

In many examples, this can be computed explicitly. If one takes

A=B=(1,), then

rA,B(h) = limx P(Xh >x, | X0 >x) = l(X0,Xh)

often called the extremal dependence coefficient (l = 0 means

independence or asymptotic independence).

The Extremogram

Strasbourg 6/0864

The extremogram is estimated via the empirical extremogram

defined by

where m with m/n 0. Note that the limit of the expectation of

the numerator is

mP (am-1X0 ϵ A, am

-1Xh ϵ B) m(AB),

where m is the measure defined in the statement of regular

variation. Hence the empirical estimate is asymptotically unbiased.

Under suitable mixing conditions, a CLT for the empirical estimate

is established in &M (2008).

The Extremogram

=

-

=

-

--

=rn

tAXa

hn

tBXaAXa

BA

tm

htmtm

In

m

In

m

h

1}{

1},{

,

1

11

)(ˆ

6/24/2008

32

Strasbourg 6/0865

Extremogram for Crystal River A = B = (1,)


lag

ex

trem

og

ram

0 10 20 30 40

0.4

0.6

0.8

1.0

Strasbourg 6/0866

Fit an AR(6) model to the data (remove all appreciable

autocorrelation in the data). Now we estimate the distribution of q

and the extremogram based on the residuals.


lag

ex

trem

og

ram

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

theta

-3 -2 -1 0 1 2 3

0.0

50

.10

0.1

50

.20

0.2

5

Vertical lines at -p/2, 0, and p/2

6/24/2008

33

Strasbourg 6/0867

There is still a touch of autocorrelation in the absolute values and

squares of the residuals. We remove these by fitting a GARCH

model to these residuals. The degrees of freedom for the noise

was 3.43


m

Hill

0 20 40 60 80 100 120 140

12

34

5

Strasbourg 6/0868

Bonus Example

time

sp

ee

d (

kn

ots

)

01

23

45

Wind Speed at Kilkenny 1/1/61-1/17/78

1961 1963 1965 1967 1969 1971 1973 1975 1977

6/24/2008

34

Strasbourg 6/0869

Bonus Example

time

sp

eed

(k

no

ts)

-2-1

01

2 Wind speed at Kilkenny adjusted

1961 1963 1965 1967 1969 1971 1973 1975 1977

time

-20

2

Wind speed at Kilkenny Adjusted for Volatility

1961 1963 1965 1967 1969 1971 1973 1975 1977

lag (h)

ac

f of

ab

s v

alu

es

0 5 10 15 20 25 30

-0.0

50

.00

.05

0.1

0

lag (h)

ac

f of

ab

s v

alu

es

0 5 10 15 20 25 30

-0.0

50

.00

.05

0.1

0

extreme value theory in time series analysis -...

Documents