online analysis of time series by the estimator
TRANSCRIPT
Online Analysis of Time Series by the Qn Estimator
Robin Nunkesser1 Karen Schettlinger2 Roland Fried2
Ursula Gather2
1Department of Computer Science, University of Dortmund
2Department of Statistics, University of Dortmund
ERCIM WG Meeting on "Computing & Statistics" 2007CFE 2007
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 1 / 21
Outline
1 Introduction
2 ComputationOffline ComputationOnline ComputationRunning Time Simulations
3 Application in a Simulation Study
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 2 / 21
Motivation: Intensive Care Online Monitoring
Arterial Mean Pressure of Two Patients
0 50 100 150 200
9010
011
012
0
time [min]
mm
Hg
0 50 100 150 200
9010
011
012
0
time [min]
mm
Hg
0 50 100 150 200 250
105
110
115
120
125
130
135
time [min]m
mH
g
0 50 100 150 200 250
105
110
115
120
125
130
135
time [min]
0 50 100 150 200 250
105
110
115
120
125
130
135
time [min]m
mH
g
GoalFast detection of relevant patterns (level/trend changes) resisting theoutliers
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 3 / 21
Signal and Noise Model for (y t)
Model
yt = µt + ut + vt
µt(signal): smooth with a few sudden shiftsut(observational noise): symmetric, mean zerovt(spiky noise): measurement artifacts
Moving window (yt−m+1, . . . , yt , . . . , yt+h) to approximate µt
Choice of Width n = m + h↑ variance, robustness↓ bias, computation time, delay
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 4 / 21
The Qn Scale Estimator
Definition (Rousseeuw and Croux (1993))For n points xi ∈ R and h = bn/2c+ 1 the estimator Qn is defined as
Qn = dn2.2219{|xi − xj |; i < j}(h2)
.
Gaussian Efficiencies of Different Robust Scale Estimators
0 10 20 30 40 50
020
4060
8010
0
sample size
effic
ienc
y
Qn
Sn
Length of the shortest half (LSH)
Interquartile range (IQR)
Median absolute deviation (MAD)
0 10 20 30 40 50
020
4060
8010
0
sample size
effic
ienc
y
Qn
Sn
Length of the shortest half (LSH)
Interquartile range (IQR)
Median absolute deviation (MAD)
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 5 / 21
Problem Transformation
Transformation of the Qn
Qn = c · {|xi − xj |; i < j}(k) = c · {x(i) − x(n−j+1); 1 ≤ i , j ≤ n}(k+n+(n2))
Qn Computation by Selecting a Rank in a Matrix of Type X + Y
x(n) + (−x(1)) · · · 0... 0 x(n−1) + (−x(n))...
...
x(2) + (−x(1)) 0...
0 x(1) + (−x(2)) · · · x(1) + (−x(n))
Definition (Hodges-Lehmann estimator)
µ̂ = median{
xi + xj
2, 1 ≤ i ≤ j ≤ n
}Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 6 / 21
The Offline Algorithm of Johnson and Mizoguchi
1 Select one of the (remaining) matrix elements2 Divide the remaining elements in O(n) time into three parts by
1 Finding elements certainly smaller than the selected element2 Finding elements certainly greater than the selected element
3 Determine one or two of the parts to exclude from the search4 If less than n elements remain determine the sought-after element,
else go to step 1
If the selection in step 1 is done by means of the weighted median, thealgorithm takes optimal O(n log n) time
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 7 / 21
Concept of the Online Algorithm
Use a buffer of size O(n) and data structures supporting fast update
Indexed AVL treeGetting, inserting, removing and rank information available in O(log n)
Each element in X , Y and the buffer is stored in an IAVL and supported bytwo pointers
The Data Structure Used
Updating needs O(|inserted/deleted buffer elements| log n) per updateNunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 8 / 21
Example
Example of a Column Deletion
Column Deletion1 Search in X for the element that is to be deleted (O(log n))2 Follow pointers to the largest/smallest element in the column (O(1))3 Determine the rows of these elements (O(1))4 Compute and delete all elements in between from the buffer
(O(|deleted elements| log n))
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 9 / 21
Properties of the Online Algorithm
DrawbackIn a worst-case scenario we still need O(
√n + k ′ log n) accumulated time
per update with k ′ = O(n2)
But...If every update position has equal probability the expectedaccumulated time per update is O(log n)
This case occurs e.g. for a constant signal with stationary noiseGeneral bound: O (Prob(most probable update position)n log n)
We expect good behaviour in practice
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 10 / 21
Running Time Simulations
Simulated Datasets1 Constant signal with standard normal noise and 10% outliers2 Additional shifts and trends
Simulated Datasets
0 50 100 150 200
−2
02
46
810
time
x
0 50 100 150 200
020
4060
time
x
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 11 / 21
Running Time Simulations
Steps Performed per Update
0 100 200 300 400 500
010
020
030
040
050
0
n
step
s pe
r up
date
Measured Buffer Positions
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 12 / 21
Running Time Simulations
Steps Performed per Update
0 100 200 300 400 500
010
020
030
040
050
0
n
step
s pe
r up
date
Measured Buffer Positions
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 12 / 21
Shift Detection
Comparison of Two Window Level Estimates Right and Left of Time t
Local constant level
timet-m+1 t t+h
-20
24
6
X
X
T (y t) = µ̂t+−µ̂t−sc
„σ̂2t+h +
σ̂2t−m
«
Two-sample t-test (Stein & Follow, 1985)Trimmed two-sample t-test (Hou & Koh, 2003)Median comparison (Bovik & Munson, 1986, Hwang & Haddad, 1994)
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 13 / 21
Power in the Standard Normal Case
Widths m = h = 7
0 5 10 15
020
4060
8010
0
shift size
pow
er
0 5 10 15
020
4060
8010
0
shift size
pow
er
Widths m = h = 15
0 1 2 3 4 5 6
020
4060
8010
0
shift size
pow
er
0 1 2 3 4 5 6
020
4060
8010
0
shift size
pow
er
t-test Qn LSH MAD
30%-trimmed t-test Sn IQR
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 14 / 21
Power in Case of a Single Outlier of Increasing Size
Widths m = h = 7
outlier size0 5 10 15 20
020
4060
8010
0po
wer
outlier size0 5 10 15 20
020
4060
8010
0po
wer
Widths m = h = 15
pow
er
0 5 10 15 20
020
4060
8010
0
outlier size
pow
er
0 5 10 15 20
020
4060
8010
0
outlier size0 5 10 15 20
020
4060
8010
0
outlier size
t-test Qn
30%-trimmed t-test Sn IQR
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 15 / 21
Increasing Number of Outliers
Widths m = h = 7
0 1 2 3 4 5 6 7
020
4060
8010
0
number of outliers
pow
er
0 1 2 3 4 5 6 7
020
4060
8010
0
number of outliers
pow
er
Widths m = h = 15
0 2 4 6 8 10 12 14
020
4060
8010
0po
wer
number of outliers0 2 4 6 8 10 12 14
020
4060
8010
0po
wer
number of outliers
Qn
Sn IQR
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 16 / 21
Example: Piecewise Constant Signal
Moving window filters (width m = h = 9)
0 50 100 150 200 250 300
05
1015
time [min]
Modified Trimmed Mean
Median with MAD
Median with Qn
mm
Hg
0 50 100 150 200 250 300
05
1015
time [min]
Modified Trimmed Mean
Median with MAD
Median with Qn
mm
Hg
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 17 / 21
Estimation of the Autocorrelation Function
SSD Method (Ma and Genton, 2000)
ρ(1) =Var(Yt+1 + Yt)− Var(Yt+1 − Yt)
Var(Yt+1 + Yt) + Var(Yt+1 − Yt)
AR(1) Process with Normal Innovations
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
0.08
Lag-one correlation ρ(1)
Mean square error (MSE)
MAD
Sn
Qn
Emp. Var.
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
0.08
Lag-one correlation ρ(1)
Mean square error (MSE)
MAD
Sn
Qn
Emp. Var.
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 18 / 21
In Case of Outliers
Single additive 6σ-outlier
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
0.08
Lag-one correlation ρ(1)
Mean square error (MSE)
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
0.08
Lag-one correlation ρ(1)
Mean square error (MSE)
5% symmetric additivecontamination of size 3
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
Lag-one correlation ρ(1)
Mean square error (MSE)
-1.0 -0.5 0 0.5 1.0
0.00
0.02
0.04
0.06
Lag-one correlation ρ(1)
Mean square error (MSE)
Emp. Var. Qn Sn MAD
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 19 / 21
Summary
New online algorithm which is substantially faster in practiceShift detection based on half-window medians and Qn scale estimatorChoose thresholds for shift detection based on estimated lag-1autocorrelation derived using the Qn once more
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 20 / 21
Thank you!
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 21 / 21
Bibliography
Bernholt, T., Fried, R., 2003. Computing the update of the repeatedmedian regression line in linear time. Inf. Process. Lett. 88, 111–117.
Bernholt, T., Fried, R., Gather, U., Wegener, I., 2006. Modifiedrepeated median filters. Statistics and Computing 16 (2), 177–192.
Davies, P. L., Fried, R., Gather, U., May 2004. Robust signal extractionfor on-line monitoring data. J. Stat. Plan. Infer. 122 (1-2), 65–78.
Fried, R., Jun. 2004. Robust filtering of time series with trends.Journal of Nonparametric Statistics 16 (3 - 4), 313–328.
Fried, R., Bernholt, T., Gather, U., May 2006. Repeated median andhybrid filters. Comput. Statist. Data. Anal. 50 (9), 2313–2338.
Gather, U., Fried, R., 2003. Robust scale estimation for local lineartemporal trends. Tatra Mt. Math. Publ. 26, 87–101.
Gather, U., Fried, R., 2004. Methods and algorithms for robustfiltering. In: Proceedings of COMPSTAT 2004. Physica Verlag,Heidelberg, pp. 159–170.
Rousseeuw, P. J., Croux, C., December 1993. Alternatives to themedian absolute deviation. Journal of the American StatisticalAssociation 88 (424), 1273–1283.
Nunkesser, Schettlinger, Fried, Gather Online Analysis of Time Series by the Qn ERCIM WG / CFE 2007 22 / 21