online analysis of time series by the estimator

23
Online Analysis of Time Series by the Q n Estimator Robin Nunkesser 1 Karen Schettlinger 2 Roland Fried 2 Ursula Gather 2 1 Department of Computer Science, University of Dortmund 2 Department of Statistics, University of Dortmund ERCIM WG Meeting on "Computing & Statistics" 2007 CFE 2007 Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 1 / 21

Upload: independent

Post on 08-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Online Analysis of Time Series by the Qn Estimator

Robin Nunkesser1 Karen Schettlinger2 Roland Fried2

Ursula Gather2

1Department of Computer Science, University of Dortmund

2Department of Statistics, University of Dortmund

ERCIM WG Meeting on "Computing & Statistics" 2007CFE 2007

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 1 / 21

Outline

1 Introduction

2 ComputationOffline ComputationOnline ComputationRunning Time Simulations

3 Application in a Simulation Study

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 2 / 21

Motivation: Intensive Care Online Monitoring

Arterial Mean Pressure of Two Patients

0 50 100 150 200

9010

011

012

0

time [min]

mm

Hg

0 50 100 150 200

9010

011

012

0

time [min]

mm

Hg

0 50 100 150 200 250

105

110

115

120

125

130

135

time [min]m

mH

g

0 50 100 150 200 250

105

110

115

120

125

130

135

time [min]

0 50 100 150 200 250

105

110

115

120

125

130

135

time [min]m

mH

g

GoalFast detection of relevant patterns (level/trend changes) resisting theoutliers

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 3 / 21

Signal and Noise Model for (y t)

Model

yt = µt + ut + vt

µt(signal): smooth with a few sudden shiftsut(observational noise): symmetric, mean zerovt(spiky noise): measurement artifacts

Moving window (yt−m+1, . . . , yt , . . . , yt+h) to approximate µt

Choice of Width n = m + h↑ variance, robustness↓ bias, computation time, delay

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 4 / 21

The Qn Scale Estimator

Definition (Rousseeuw and Croux (1993))For n points xi ∈ R and h = bn/2c+ 1 the estimator Qn is defined as

Qn = dn2.2219{|xi − xj |; i < j}(h2)

.

Gaussian Efficiencies of Different Robust Scale Estimators

0 10 20 30 40 50

020

4060

8010

0

sample size

effic

ienc

y

Qn

Sn

Length of the shortest half (LSH)

Interquartile range (IQR)

Median absolute deviation (MAD)

0 10 20 30 40 50

020

4060

8010

0

sample size

effic

ienc

y

Qn

Sn

Length of the shortest half (LSH)

Interquartile range (IQR)

Median absolute deviation (MAD)

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 5 / 21

Problem Transformation

Transformation of the Qn

Qn = c · {|xi − xj |; i < j}(k) = c · {x(i) − x(n−j+1); 1 ≤ i , j ≤ n}(k+n+(n2))

Qn Computation by Selecting a Rank in a Matrix of Type X + Y

x(n) + (−x(1)) · · · 0... 0 x(n−1) + (−x(n))...

...

x(2) + (−x(1)) 0...

0 x(1) + (−x(2)) · · · x(1) + (−x(n))

Definition (Hodges-Lehmann estimator)

µ̂ = median{

xi + xj

2, 1 ≤ i ≤ j ≤ n

}Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 6 / 21

The Offline Algorithm of Johnson and Mizoguchi

1 Select one of the (remaining) matrix elements2 Divide the remaining elements in O(n) time into three parts by

1 Finding elements certainly smaller than the selected element2 Finding elements certainly greater than the selected element

3 Determine one or two of the parts to exclude from the search4 If less than n elements remain determine the sought-after element,

else go to step 1

If the selection in step 1 is done by means of the weighted median, thealgorithm takes optimal O(n log n) time

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 7 / 21

Concept of the Online Algorithm

Use a buffer of size O(n) and data structures supporting fast update

Indexed AVL treeGetting, inserting, removing and rank information available in O(log n)

Each element in X , Y and the buffer is stored in an IAVL and supported bytwo pointers

The Data Structure Used

Updating needs O(|inserted/deleted buffer elements| log n) per updateNunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 8 / 21

Example

Example of a Column Deletion

Column Deletion1 Search in X for the element that is to be deleted (O(log n))2 Follow pointers to the largest/smallest element in the column (O(1))3 Determine the rows of these elements (O(1))4 Compute and delete all elements in between from the buffer

(O(|deleted elements| log n))

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 9 / 21

Properties of the Online Algorithm

DrawbackIn a worst-case scenario we still need O(

√n + k ′ log n) accumulated time

per update with k ′ = O(n2)

But...If every update position has equal probability the expectedaccumulated time per update is O(log n)

This case occurs e.g. for a constant signal with stationary noiseGeneral bound: O (Prob(most probable update position)n log n)

We expect good behaviour in practice

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 10 / 21

Running Time Simulations

Simulated Datasets1 Constant signal with standard normal noise and 10% outliers2 Additional shifts and trends

Simulated Datasets

0 50 100 150 200

−2

02

46

810

time

x

0 50 100 150 200

020

4060

time

x

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 11 / 21

Running Time Simulations

Steps Performed per Update

0 100 200 300 400 500

010

020

030

040

050

0

n

step

s pe

r up

date

Measured Buffer Positions

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 12 / 21

Running Time Simulations

Steps Performed per Update

0 100 200 300 400 500

010

020

030

040

050

0

n

step

s pe

r up

date

Measured Buffer Positions

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 12 / 21

Shift Detection

Comparison of Two Window Level Estimates Right and Left of Time t

Local constant level

timet-m+1 t t+h

-20

24

6

X

X

T (y t) = µ̂t+−µ̂t−sc

„σ̂2t+h +

σ̂2t−m

«

Two-sample t-test (Stein & Follow, 1985)Trimmed two-sample t-test (Hou & Koh, 2003)Median comparison (Bovik & Munson, 1986, Hwang & Haddad, 1994)

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 13 / 21

Power in the Standard Normal Case

Widths m = h = 7

0 5 10 15

020

4060

8010

0

shift size

pow

er

0 5 10 15

020

4060

8010

0

shift size

pow

er

Widths m = h = 15

0 1 2 3 4 5 6

020

4060

8010

0

shift size

pow

er

0 1 2 3 4 5 6

020

4060

8010

0

shift size

pow

er

t-test Qn LSH MAD

30%-trimmed t-test Sn IQR

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 14 / 21

Power in Case of a Single Outlier of Increasing Size

Widths m = h = 7

outlier size0 5 10 15 20

020

4060

8010

0po

wer

outlier size0 5 10 15 20

020

4060

8010

0po

wer

Widths m = h = 15

pow

er

0 5 10 15 20

020

4060

8010

0

outlier size

pow

er

0 5 10 15 20

020

4060

8010

0

outlier size0 5 10 15 20

020

4060

8010

0

outlier size

t-test Qn

30%-trimmed t-test Sn IQR

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 15 / 21

Increasing Number of Outliers

Widths m = h = 7

0 1 2 3 4 5 6 7

020

4060

8010

0

number of outliers

pow

er

0 1 2 3 4 5 6 7

020

4060

8010

0

number of outliers

pow

er

Widths m = h = 15

0 2 4 6 8 10 12 14

020

4060

8010

0po

wer

number of outliers0 2 4 6 8 10 12 14

020

4060

8010

0po

wer

number of outliers

Qn

Sn IQR

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 16 / 21

Example: Piecewise Constant Signal

Moving window filters (width m = h = 9)

0 50 100 150 200 250 300

05

1015

time [min]

Modified Trimmed Mean

Median with MAD

Median with Qn

mm

Hg

0 50 100 150 200 250 300

05

1015

time [min]

Modified Trimmed Mean

Median with MAD

Median with Qn

mm

Hg

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 17 / 21

Estimation of the Autocorrelation Function

SSD Method (Ma and Genton, 2000)

ρ(1) =Var(Yt+1 + Yt)− Var(Yt+1 − Yt)

Var(Yt+1 + Yt) + Var(Yt+1 − Yt)

AR(1) Process with Normal Innovations

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

0.08

Lag-one correlation ρ(1)

Mean square error (MSE)

MAD

Sn

Qn

Emp. Var.

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

0.08

Lag-one correlation ρ(1)

Mean square error (MSE)

MAD

Sn

Qn

Emp. Var.

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 18 / 21

In Case of Outliers

Single additive 6σ-outlier

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

0.08

Lag-one correlation ρ(1)

Mean square error (MSE)

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

0.08

Lag-one correlation ρ(1)

Mean square error (MSE)

5% symmetric additivecontamination of size 3

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

Lag-one correlation ρ(1)

Mean square error (MSE)

-1.0 -0.5 0 0.5 1.0

0.00

0.02

0.04

0.06

Lag-one correlation ρ(1)

Mean square error (MSE)

Emp. Var. Qn Sn MAD

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 19 / 21

Summary

New online algorithm which is substantially faster in practiceShift detection based on half-window medians and Qn scale estimatorChoose thresholds for shift detection based on estimated lag-1autocorrelation derived using the Qn once more

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 20 / 21

Thank you!

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 21 / 21

Bibliography

Bernholt, T., Fried, R., 2003. Computing the update of the repeatedmedian regression line in linear time. Inf. Process. Lett. 88, 111–117.

Bernholt, T., Fried, R., Gather, U., Wegener, I., 2006. Modifiedrepeated median filters. Statistics and Computing 16 (2), 177–192.

Davies, P. L., Fried, R., Gather, U., May 2004. Robust signal extractionfor on-line monitoring data. J. Stat. Plan. Infer. 122 (1-2), 65–78.

Fried, R., Jun. 2004. Robust filtering of time series with trends.Journal of Nonparametric Statistics 16 (3 - 4), 313–328.

Fried, R., Bernholt, T., Gather, U., May 2006. Repeated median andhybrid filters. Comput. Statist. Data. Anal. 50 (9), 2313–2338.

Gather, U., Fried, R., 2003. Robust scale estimation for local lineartemporal trends. Tatra Mt. Math. Publ. 26, 87–101.

Gather, U., Fried, R., 2004. Methods and algorithms for robustfiltering. In: Proceedings of COMPSTAT 2004. Physica Verlag,Heidelberg, pp. 159–170.

Rousseeuw, P. J., Croux, C., December 1993. Alternatives to themedian absolute deviation. Journal of the American StatisticalAssociation 88 (424), 1273–1283.

Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 22 / 21