inference for multiple change-points in time series via ... for multiple change-points in time...

46
Inference for multiple change-points in time series via scan statistics Chun Yip Yau Chinese University of Hong Kong Joint with Zifeng Zhao (Univ. of Wisconsin-Madison) Research supported in part by HKSAR-RGC-GRF Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 1 / 46

Upload: dinhdiep

Post on 26-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Inference for multiple change-points

in time series via scan statistics

Chun Yip YauChinese University of Hong Kong

Joint with Zifeng Zhao (Univ. of Wisconsin-Madison)

Research supported in part by HKSAR-RGC-GRF

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 1 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 2 / 46

...

Motivation

Where are the change-points?

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 3 / 46

...

Motivation

Where are the change-points?

Literatures:

|τ − τ0| = Op(1) .

⇒To locate a change-point,

We don’t really need too many data.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 4 / 46

...

Change-point Estimation in Local Windows

Toy example:

Xt =

et if t < 100

1 + et if t ≥ 100, et

i.i.d.∼ N(0, 1) .

Observations: X1, X2, . . . , X200.Estimations:For Y = X91:110;X81:120;X71:140; . . . ;X11:190;X1:200

Use CUSUM statistic

τ = arg maxk

k∑j=1

yj − ky

.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 5 / 46

...

Change-point Estimation in Local Windows

0 50 100 150 200

−2−1

01

2

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

0 50 100 150 200

−20

12

3

Findings:

In 80/100 replications, the 10 τs are exactly the same.

For the other 20 replications, the average s.d. of the 10 τs is 3.43.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 6 / 46

...

Main Objectives:

Literatures:

Scan Statistics: Naus (1965), Glaz (2001) . . .

Testing for Change-point: Bauer & Hackl (78, 80), Chan &Walther (13) . . .

Location estimation: Fryzlewicz (14), Dette (14)

Objectives: By using a Likelihood Ratio Scan Statistic,

Consistent estimation of multiple change-points in time series.

Construction of confidence intervals.

Fast and easy to implement

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 7 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 8 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 9 / 46

...

Setting and Notations

Data: Piecewise stationary autoregressive time series Xtnt=1

Y

0 100 200 300 400 500 600 700

-4-2

02

46

…, , , (1)

(n)The j-th stationary segment: Yt,j , τj−1 < t ≤ τj

Yt,j = φj0 + φj1Yt−1,j + · · ·+ φj,pjYt−pj ,j + σjεt , εti.i.d.∼ N(0, 1) .

Parameters:

Break-points: 0 = τ0, τ1, τ2, ..., τm, τm+1 = nRelative location of breaks: λj :=

τjn . Assume |λj+1 − λj | > ελ.

AR parameter vectors: θj := (φj,0, φj,1, . . . , φj,pj , σ2j ).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 10 / 46

...

Outline

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 11 / 46

...

Likelihood Ratio Scan Statistic (LRSS)

The Scan Statistic: (h < min(λi − λi−1)/4)

Sh(t) = logL(t− h+ 1 : t)L(t+ 1 : t+ h)

L(t− h+ 1 : t+ h)

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 12 / 46

...

Likelihood Ratio Scan Statistic (LRSS)

Conditional log-Likelihood of AR model for data y = (y1, . . . , yn):

l(θ,y) = −∑

(yt − φ0 − φ1yt−1 − · · · − φpyt−p)2

2σ2− 1

2log σ2

l(θ,y) = −n2− 1

2log σ2 ,

where σ2 = γ(0)− γT φ, φ = Γ−1γ.

Scan Statistics

Sh(t) = log σ2t −1

2log σ21,t −

1

2log σ22,t .

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 13 / 46

...

Likelihood Ratio Scan Statistic (LRSS)

The set of Local Change-Point Estimators:

L(1) =

t : Sh(t) = max

u∈[t−h,t+h]Sh(u)

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 14 / 46

...

Likelihood Ratio Scan Statistic (LRSS)

Theorem 1

Let

True CP: Lo = (τ o1 , . . . , τomo)

Local CP estimates: L(1) = τ (1)1 , τ(1)2 , . . . , τ

(1)

m(1)minj=1,...,mo(λ

oj+1 − λoj) > ελ

There exists some d > 0 such that for h ≥ d log n and h < nελ/4,

P

(maxτoj ∈L0

minτ(1)k ∈L(1)

|τ oj − τ(1)k | < h

)→ 1 .

Possibly overestimation: m(1) > mo

Each of the true τ oj is surrounded by a τ(1)k in a h-neighborhood.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 15 / 46

...

Outline

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 16 / 46

...

Estimation of Change-points

From the scanning step, we have L(1) = τ (1)1 , τ(1)2 , . . . , τ

(1)m with

P

(maxτoj ∈L0

minτ(1)k ∈L(1)

|τ oj − τ(1)k | < h

)→ 1 .

Infeasible: (m, L, p) = arg minL⊂1,2,...,n

IC(m,L,p)

Feasible: (m(2), L(2), p(2)) = arg minL⊂L(1)

IC(m,L,p)

Examples:

AIC(m,L,p) =m+1∑j=1

nj

2 log(σ2j ) + 2(m+

m+1∑j=1

(pj + 2))

BIC(m,L,p) =m+1∑j=1

nj

2 log(σ2j ) + (m+

m+1∑j=1

(pj + 2)) log n

MDL(m,L,p) =m+1∑j=1

nj

2 log(σ2j ) +

m+1∑j=1

pj+22 log(nj) + (m+ 1) log(n)

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 17 / 46

...

Estimation of Change-points

Theorem 2

Under the setting of Theorem 1, if

IC(m,L,p) =

m+1∑j=1

nj2

log(σ2j ) + ωn(m+

m+1∑j=1

pj)

with ωn = o(n) and ωnlog logn →∞, then

m(2) p→ mo , P

(max

j=1,...,mo|τ (2)j − τ

oj | < h

)→ 1 , max

j=1,...,mo|p(2)j − p

oj |

p→ 0 .

h > d log n→∞⇒ no guarantee that |τ (2)j − τ oj | = Op(1).

τ oj ∈(τ(2)j − h, τ

(2)j + h

]⇒ improvement by extended local window

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 18 / 46

...

Outline

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 19 / 46

...

Final Estimates and Confidence Intervals

From the model selection step we have

P(τ oj ∈

(τ(2)j − h, τ

(2)j + h

])→ 1

Define the extended local window

Ej =(τ(2)j − 2h, τ

(2)j + 2h

]⇒ τ oj is within

(14 ,

34

)of Ej .

Ej contains one τojNumber of observation before/after τoj →∞

⇒ Final estimate and C.I. can be obtained from Ej

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 20 / 46

...

Final estimates and Confidence Intervals

Final Estimates of Change Points

τ(3)j = arg max

τ∈(τ (2)j −h,τ(2)j +h]

logL(τ(2)j − 2h+ 1 : τ)L(τ + 1 : τ

(2)j + 2h)

Theorem 3

Under the setting of Theorem 1 and 2,

maxj=1,...,mo

|τ (3)j − τoj | = Op(1) .

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 21 / 46

...

Final Estimates and Confidence Intervals

Theorem 4

Asymptotic distribution of Change Points (Ling (2013))

τ(3)j − τ

oj

d→ arg maxk

Wk ,

where

Wk =

∑τoj +k

t=τoj(lt(θ

o1)− lt(θo2)) k > 0

0 k = 0∑τoj −1t=τoj +k

(lt(θo2)− lt(θo1)) k < 0

is the double-sided random walk.

Unknown closed form expression for the c.d.f. of Wk.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 22 / 46

...

Final Estimates and Confidence Intervals

Theorem 5

Approximation (Ling 2013):

(dΣd)2(dΩd)−1(τ(3)j − τ

oj )

d→ arg maxr∈R

B(r)− 1

2|r| ,

where d = θ1 − θ2 , Dt(θ) = ∂lt(θ)/∂θ , D =∑

t∈Ej Dt(θ)/4h ,

Σ =1

4h

∑t∈Ej

∂2lt(θ)

∂θ∂θ′

∣∣∣∣θ2

, Ω =1

4h

∑t∈Ej

(Dt(θ2)− D)(Dt(θ2)− D)′ .

Confidence Interval for τ oj :

τ(3)j ±

(1 +

[∆Fα/2

])where ∆ = (dΣd)−2(dΩd) , P

(∣∣∣∣arg maxr∈R

B(r)− 12 |r|∣∣∣∣ < Fα/2

)= 1− α .

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 23 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 24 / 46

...

Three-step procedure for change-point estimation

Summary

Step 1: Detection by Likelihood Ratio Scan Statistics.

Step 2: Model Selection by Information Criterion.

Step 3: Construction of Confidence Intervals.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 25 / 46

...

Three-step procedure for change-point estimation

Step 1: Detection by Likelihood Ratio Scan Statistics.

Computational Complexity

O(h) for computing Sh(t).

n− 2h replicates of Sh(t), (t = h+ 1, . . . , n− h).

Sorting for Local Change-point estimator: Negligible.

⇒ Total: O(nh).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 26 / 46

...

Three-step procedure for change-point estimation

Step 2: Model Selection Approach for consistent change-point estimation.

(m(2), L(2), p(2)) = arg minL⊂L(1)

IC(m,L,p)

Computational Complexity

Exact Maximization: O(m(1))2 ×O(n) by dynamic programming.

Short cut:

Sort Sh(t)t∈L(1) and consider arg minB largest Sh(·)

IC

O(B2)×O(n)

⇒ Total: O(B2n).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 27 / 46

...

Three-step procedure for change-point estimation

Step 3: Final Estimates and Confidence Intervals.

τ(3)j = arg max

τ∈(τ (2)j −h,τ(2)j +h]

logL(τ(2)j − 2h+ 1 : τ)L(τ + 1 : τ

(2)j + 2h)

C.I. = τ(3)j ±

(1 +

[∆Fα/2

])Computational Complexity

Find τ(3)j : 2h×O(h).

Construct C.I.: O(h).

⇒ Total: O(h2).

Overall Computational Complexity:

O(nh) +O(B2n) +O(h2) = O(nh) =O(n log n)

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 28 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 29 / 46

...

Sensitivity Analysis

In the 3-step procedure, there is only one tuning parameter:

h = d log n

Sensitivity analysis for dHow does d affect m(1) = |L(1)|?How does d affect the estimation results?

Model we use:

Xt =

0.9Xt−1 + εt, if 1 ≤ t ≤ 0.5n

1.69Xt−1 − 0.81Xt−2 + εt, if 0.5n+ 1 ≤ t ≤ 0.75n

1.32Xt−1 − 0.81Xt−2 + εt, if 0.75n+ 1 ≤ t ≤ n

where εti.i.d.∼ N(0, 1).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 30 / 46

...

Sensitivity Analysis

n = 1024, Rep=100

d h = d logn m(2) = 2 |τ(2)j − τ(o)j | < h Average MDL m(1) Time

2 13 100 54 1011.139 34.95 8.37254 27 98 80 1013.365 17.65 4.2676 41 98 85 1015.85 11.4 3.05478 55 98 92 1017.205 8.39 2.690510 69 100 91 1017.182 6.42 2.576412 83 99 93 1018.079 5 2.430514 97 100 99 1016.94 3.93 2.28616 110 98 96 1015.052 3.41 2.266518 124 98 98 1015.527 2.88 2.130620 138 100 98 1015.273 2.73 2.05622 152 100 99 1013.48 2.51 2.11124 166 99 99 1013.638 2.27 1.988226 180 98 98 1012.462 2.07 1.805828 194 98 98 1012.07 1.99 1.822530 207 98 98 1011.397 1.98 1.851832 221 91 91 1012.718 1.91 1.612734 235 76 76 1020.511 1.76 1.722836 249 52 52 1029.787 1.52 1.80538 263 3 3 1049.257 1.03 1.358240 277 1 1 1049.649 1.01 1.3

Once h is not too large, mo can be estimated consistently.

Sudden increase in MDL when h > min(τ oj+1 − τ oj ).

h ∈ (50, 100) works well in most cases.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 31 / 46

...

Simulation Experiment 1

Piecewise AR model

Xt =

0.9Xt−1 + εt, if 1 ≤ t ≤ 0.5n

1.69Xt−1 − 0.81Xt−2 + εt, if 0.5n+ 1 ≤ t ≤ 0.75n

1.32Xt−1 − 0.81Xt−2 + εt, if 0.75n+ 1 ≤ t ≤ n

where εti.i.d.∼ N(0, 1).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 32 / 46

...

Simulation Experiment 1

100 replications, τ1=0.5, τ2=0.75

LRSM2 τ1 τ2 Coverage Time

N=1024 0.500(0.009) 0.749(0.013) 98 2.24N=2048 0.499(0.008) 0.750(0.003) 99 5.17N=4096 0.499(0.003) 0.750(0.001) 98 11.23N=8192 0.500(0.001) 0.750(0.001) 99 24.69N=16384 0.500(0.001) 0.750(0.001) 99 58.72N=32768 0.500(0.0002) 0.750(0.0002) 99 129.34

Auto-PARM τ1 τ2 Coverage Time

N=1024 0.496(0.013) 0.750(0.008) 100 172.05N=2048 0.498(0.006) 0.751(0.005) 100 202.18N=4096 0.498(0.004) 0.750(0.003) 100 371.41N=8192 0.500(0.002) 0.750(0.002) 100 776.82N=16384 0.499(0.002) 0.750(0.001) 100 1513.58N=32768 0.500(0.001) 0.750(0.001) 98 2996.56

PELT (Rep = 5) τ1 τ2 Coverage Time

N=1024 0.501(0.006) 0.750(0.005) 100 1141.92N=2048 0.498(0.003) 0.751(0.005) 100 2911.11

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 33 / 46

...

Simulation Experiment 2

Piecewise AR/MA

Xt =

−0.9Xt−1 + εt + 0.7εt−1, if 1 ≤ t ≤ 0.5n

0.9Xt−1 + εt, if 0.5N + 1 ≤ t ≤ 0.75n

εt − 0.7εt−1, if 0.75n+ 1 ≤ t ≤ n

where εti.i.d.∼ N(0, 1).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 34 / 46

...

Simulation Experiment 2

100 replications, τ1=0.5, τ2=0.75

LRSS τ1 τ2 Coverage Time

N=1024 0.498(0.006) 0.751(0.002) 100 2.21N=2048 0.499(0.003) 0.751(0.001) 100 4.90N=4096 0.499(0.001) 0.750(0.001) 100 11.03N=8192 0.500(0.001) 0.750(0.0002) 100 23.78N=16384 0.500(0.0004) 0.750(0.0001) 100 54.99N=32768 0.500(0.0001) 0.750(0.0001) 100 117.45

Auto-PARM τ1 τ2 Coverage Time

N=1024 0.496(0.008) 0.752(0.004) 100 162.85N=2048 0.499(0.005) 0.751(0.002) 100 305.88N=4096 0.499(0.003) 0.751(0.001) 100 509.16N=8192 0.499(0.001) 0.750(0.001) 99 1204.02N=16384 0.500(0.001) 0.750(0.0004) 98 2105.29N=32768 0.500(0.0005) 0.750(0.0003) 84 4924.47

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 35 / 46

...

Simulation Experiment 3

Piecewise AR

Xt =

0.9Xt−1 + εt, if 1 ≤ t ≤ 0.5n

1.69Xt−1 − 0.81Xt−2 + εt, if 0.5n+ 1 ≤ t ≤ 0.75n

1.32Xt−1 − 0.81Xt−2 + εt, if 0.75n+ 1 ≤ t ≤ n

where εti.i.d.∼ N(0, 1).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 36 / 46

...

Simulation Experiment 3

Coverage accuracy of confidence intervals:

n=1024, 1000 replications

τoj Mean of τ (3) Mean of 90% C.I. Coverage Probability

512 512 [488, 536] 91.7%768 768 [760, 775] 90.3%

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 37 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 38 / 46

...

Electroencephalogram (EEG) Time Series

Recorded from a patient diagnosed with left temporal lobe epilepsy.

Data collection:

Sampling rate: 100Hz,Recording period: 5 minutes and 28 seconds,Sample size: n=32,768.

Investigated by Ombao et al. (2000) and Davis, Lee andRodriguez-Yam (2006).

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 39 / 46

...

Electroencephalogram (EEG) Time Series

Results of EEG data analysis: n = 32768

Three-step Auto-PARM PELT

No. of τ 14 12 33MDL 114195 114335 114074Time 441s 5726s > 1 week (186.9 hours)

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 40 / 46

...

Electroencephalogram (EEG) Time Series

PELT (Killick, Fearnhead, Eckley (2012))

AUTO-PARM (Davis, Lee and Rodriguez-Yam (2005))

Three-step procedure

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 41 / 46

...

IBM return data

From Jan 1926 to Dec 2008.

996 data points.

Time for computation: 4.2s

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 42 / 46

...

IBM return

Results:

1 2

τ 41 174C.I. [19, 63] [168, 180]

Length 44 12

Interpretations:

July 1927 - Oct 1931: Great Depression.

Dec 1939 - Dec 1940: World War II.Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 43 / 46

...

Content

1 Motivation

2 Likelihood Ratio Scan for Change-Points DetectionChange-point Detection by Scan StatisticConsistent Change-point Estimation by Model SelectionConstruction of Confidence Intervals

3 Implementation Issue and Computational Complexity

4 Simulation Studies

5 Applications

6 Conclusions

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 44 / 46

...

Conclusions

Fast three-step procedure for change-points inference.

Step 1: Detection using Scan statisticsStep 2: Estimation by model selection information criterion.Step 3: Extended window for CI

Advantages:

FastFew and non-sensitive tuning parametersEspecially suitable for large n small m case.

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 45 / 46

...

Thank You!

Chun Yip Yau (CUHK) LR Scan for Change Points 15 Jan 2014 46 / 46