automatic outlier detection: a bayesian approachjting.net/pubs/2007/ting-icra2007-slides.pdf ·...

Automatic Outlier Detection:A Bayesian Approach

Jo-Anne Ting, University of Southern CaliforniaAaron D’Souza, Google, Inc.Stefan Schaal, University of Southern California

ICRA 2007April 12, 2007

J. Ting 2

Outline

• Motivation• Past & related work• Bayesian regression for automatic outlier detection

– Batch version– Incremental version

• Results– Synthetic data– Robotic data

• Conclusions

J. Ting 3

Motivation

• Real-world sensor data is susceptible to outliers– E.g., motion capture (MOCAP) data of a robotic dog

J. Ting 4

Outline




• Conclusions

J. Ting 5

Past & Related Work

• Current methods for outlier detection may:

– Require parameter tuning (i.e. an optimal threshold)

– Require sampling (e.g. active sampling, Abe et al., 2006) orthe setting of certain parameters, e.g., k in k-meansclustering (MacQueen, 1967)

– Assume an underlying data structure (e.g. mixture models,Fox et al., 1999)

– Adopt a weighted linear regression model, but model theweights with some heuristic function (e.g., robust leastsquares, Hoaglin, 1983)

J. Ting 6

Outline




• Conclusions

J. Ting 7

Bayesian Regression for Automatic Outlier Detection

• Consider linear regression:

• We can modify the above to get a weighted linearregression model (Gelman et al., 1995):

yi= bTx

i+!y

i

yi~ Normal bTx

i,! 2

wi

"

#$

%

&'

b ~ Normal b0,! 2(

b0( ) Except now:

wi~ Gamma aw

i,bwi

( )

J. Ting 8


• This Bayesian treatment of weighted linear regression:

• Is suitable for real-time outlier detection• Requires no model assumptions• Requires no parameter tuning

J. Ting 9


• Our goal is to infer the posterior distributions of b and w

• We can treat this as an EM problem (Dempster et al.,1977) and maximize the incomplete log likelihood:

by maximizing the expected complete log likelihood:

log p(y |X)

E[log p(y,b,w |X)]

J. Ting 10


• In the E-step, we need to calculate:

but since the true posterior over all hidden variables isanalytically intractable, we make a factorial variationalapproximation (Hinton & van Camp 1993, Ghahramani &Beal, 2000):

EQ(b,w)

[log p(y,b,w |X)]

Q(b,w)=Q(b)Q(w)

J. Ting 11


• EM Update Equations (batch version):

E - step :

!b= !

b0

"1+ w

ixixi

T

i=1

N

#$%&

'()

"1

b = !b

!b0

"1b0+ w

iyixi

i=1

N

#$%&

'()

wi=

awi ,0

+ 0.5

bwi ,0

+1

2* 2yi" b

Txi( )2

+1

2xi

T!bxi

M- step :

* 2=1

Nw

iyi" b

Txi( ) + w

ixi

T!bxi

+, -.i=1

N

#

If prediction error isvery large, E[wi] goesto 0

Point isdownweighted

Reminder:

yi ~ Normal bTxi ,

! 2

wi

"#$

%&'

J. Ting 12


• EM Update Equations (incremental version):E - step :

!b = !b0

"1+ !k

wxxT

i=1

N

#$%&

'()

"1

b = !b !b0

"1b0 + !k

wyx

i=1

N

#$%&

'()

wi =awi ,0 + 0.5

bwi ,0 +1

2* 2yi " b

Txi( )

2

+1

2xiT!bxi

M- step :

* 2=1

Nk

!k

wy2 " 2!k

wyx+ b

T !k

wxxT

b +1T diag !k

wxxT !b{ }+,

-.

i=1

N

#

Sufficient statisticsare exponentiallydiscounted by λ,0 ≤ λ ≤ 1 (e.g., Ljung& Soderstrom, 1983)

Nk = 1+ !Nk"1

#k

wxxT

= wk xkxkT+ !#k"1

wxxT

#k

wyx= wk ykxk + !#k"1

wyx

#k

wy2

= wk yk2+ !#k"1

wy2

J. Ting 13

Outline




• Conclusions

J. Ting 14

Results: Synthetic Data

• Given noisy data (+outliers) from a linear regression problem:

• 5 input dimensions• 1000 samples• SNR = 10• 20% outliers• outliers are 3σ from output mean

J. Ting 15

Results: Synthetic Data Available in Batch Form

Average Normalized Mean Squared Prediction Error as a Function ofHow Far Outliers are from Inliers

0.02860.06880.1327Mixture model

0.08800.15180.1890Robust Least Squares

0.02820.06830.1320Robust Regression (Faul &Tipping 2001)

0.02100.02700.0273Bayesian weighted regression

0.02320.05030.0903Thresholding (optimally tuned)

+σ+2σ+3σAlgorithm

Distance of outliers from mean is at least…

Data: Globally linear data with 5 input dimensions evaluated in batch form, averaged over 10 trials(SNR = 10, σ is the standard deviation of the true conditional output mean)

Lowest prediction error

J. Ting 16

Results: Synthetic Data Available Incrementally

Prediction Error Over Time with Outliers at least 2σ away (λ=0.999)


J. Ting 17

Results: Synthetic Data Available Incrementally


Prediction Error Over Time with Outliers at least 3σ away (λ=0.999)

J. Ting 18

Results: Robotic Orientation Data

• Offset between MOCAP data & IMU data for LittleDog:

J. Ting 19

Results: Predicted Output on LittleDog MOCAP Data

J. Ting 20

Outline




• Conclusions

J. Ting 21

Conclusions

• We have an algorithm that:– Automatically detects outliers in real-time– Requires no user interference, parameter tuning or

sampling– Performs on par with and even exceeds standard outlier

detection methods

• Extensions to the Kalman filter and other filters

automatic outlier detection: a bayesian approachjting.net/pubs/2007/ting-icra2007-slides.pdf ·...

Documents