automatic outlier detection: a bayesian approachjting.net/pubs/2007/ting-icra2007-slides.pdf ·...
TRANSCRIPT
Automatic Outlier Detection:A Bayesian Approach
Jo-Anne Ting, University of Southern CaliforniaAaron D’Souza, Google, Inc.Stefan Schaal, University of Southern California
ICRA 2007April 12, 2007
J. Ting 2
Outline
• Motivation• Past & related work• Bayesian regression for automatic outlier detection
– Batch version– Incremental version
• Results– Synthetic data– Robotic data
• Conclusions
J. Ting 3
Motivation
• Real-world sensor data is susceptible to outliers– E.g., motion capture (MOCAP) data of a robotic dog
J. Ting 4
Outline
• Motivation• Past & related work• Bayesian regression for automatic outlier detection
– Batch version– Incremental version
• Results– Synthetic data– Robotic data
• Conclusions
J. Ting 5
Past & Related Work
• Current methods for outlier detection may:
– Require parameter tuning (i.e. an optimal threshold)
– Require sampling (e.g. active sampling, Abe et al., 2006) orthe setting of certain parameters, e.g., k in k-meansclustering (MacQueen, 1967)
– Assume an underlying data structure (e.g. mixture models,Fox et al., 1999)
– Adopt a weighted linear regression model, but model theweights with some heuristic function (e.g., robust leastsquares, Hoaglin, 1983)
J. Ting 6
Outline
• Motivation• Past & related work• Bayesian regression for automatic outlier detection
– Batch version– Incremental version
• Results– Synthetic data– Robotic data
• Conclusions
J. Ting 7
Bayesian Regression for Automatic Outlier Detection
• Consider linear regression:
• We can modify the above to get a weighted linearregression model (Gelman et al., 1995):
yi= bTx
i+!y
i
yi~ Normal bTx
i,! 2
wi
"
#$
%
&'
b ~ Normal b0,! 2(
b0( ) Except now:
wi~ Gamma aw
i,bwi
( )
J. Ting 8
Bayesian Regression for Automatic Outlier Detection
• This Bayesian treatment of weighted linear regression:
• Is suitable for real-time outlier detection• Requires no model assumptions• Requires no parameter tuning
J. Ting 9
Bayesian Regression for Automatic Outlier Detection
• Our goal is to infer the posterior distributions of b and w
• We can treat this as an EM problem (Dempster et al.,1977) and maximize the incomplete log likelihood:
by maximizing the expected complete log likelihood:
log p(y |X)
E[log p(y,b,w |X)]
J. Ting 10
Bayesian Regression for Automatic Outlier Detection
• In the E-step, we need to calculate:
but since the true posterior over all hidden variables isanalytically intractable, we make a factorial variationalapproximation (Hinton & van Camp 1993, Ghahramani &Beal, 2000):
EQ(b,w)
[log p(y,b,w |X)]
Q(b,w)=Q(b)Q(w)
J. Ting 11
Bayesian Regression for Automatic Outlier Detection
• EM Update Equations (batch version):
E - step :
!b= !
b0
"1+ w
ixixi
T
i=1
N
#$%&
'()
"1
b = !b
!b0
"1b0+ w
iyixi
i=1
N
#$%&
'()
wi=
awi ,0
+ 0.5
bwi ,0
+1
2* 2yi" b
Txi( )2
+1
2xi
T!bxi
M- step :
* 2=1
Nw
iyi" b
Txi( ) + w
ixi
T!bxi
+, -.i=1
N
#
If prediction error isvery large, E[wi] goesto 0
Point isdownweighted
Reminder:
yi ~ Normal bTxi ,
! 2
wi
"#$
%&'
J. Ting 12
Bayesian Regression for Automatic Outlier Detection
• EM Update Equations (incremental version):E - step :
!b = !b0
"1+ !k
wxxT
i=1
N
#$%&
'()
"1
b = !b !b0
"1b0 + !k
wyx
i=1
N
#$%&
'()
wi =awi ,0 + 0.5
bwi ,0 +1
2* 2yi " b
Txi( )
2
+1
2xiT!bxi
M- step :
* 2=1
Nk
!k
wy2 " 2!k
wyx+ b
T !k
wxxT
b +1T diag !k
wxxT !b{ }+,
-.
i=1
N
#
Sufficient statisticsare exponentiallydiscounted by λ,0 ≤ λ ≤ 1 (e.g., Ljung& Soderstrom, 1983)
Nk = 1+ !Nk"1
#k
wxxT
= wk xkxkT+ !#k"1
wxxT
#k
wyx= wk ykxk + !#k"1
wyx
#k
wy2
= wk yk2+ !#k"1
wy2
J. Ting 13
Outline
• Motivation• Past & related work• Bayesian regression for automatic outlier detection
– Batch version– Incremental version
• Results– Synthetic data– Robotic data
• Conclusions
J. Ting 14
Results: Synthetic Data
• Given noisy data (+outliers) from a linear regression problem:
• 5 input dimensions• 1000 samples• SNR = 10• 20% outliers• outliers are 3σ from output mean
J. Ting 15
Results: Synthetic Data Available in Batch Form
Average Normalized Mean Squared Prediction Error as a Function ofHow Far Outliers are from Inliers
0.02860.06880.1327Mixture model
0.08800.15180.1890Robust Least Squares
0.02820.06830.1320Robust Regression (Faul &Tipping 2001)
0.02100.02700.0273Bayesian weighted regression
0.02320.05030.0903Thresholding (optimally tuned)
+σ+2σ+3σAlgorithm
Distance of outliers from mean is at least…
Data: Globally linear data with 5 input dimensions evaluated in batch form, averaged over 10 trials(SNR = 10, σ is the standard deviation of the true conditional output mean)
Lowest prediction error
J. Ting 16
Results: Synthetic Data Available Incrementally
Prediction Error Over Time with Outliers at least 2σ away (λ=0.999)
Lowest prediction error
J. Ting 17
Results: Synthetic Data Available Incrementally
Lowest prediction error
Prediction Error Over Time with Outliers at least 3σ away (λ=0.999)
J. Ting 18
Results: Robotic Orientation Data
• Offset between MOCAP data & IMU data for LittleDog:
J. Ting 19
Results: Predicted Output on LittleDog MOCAP Data
J. Ting 20
Outline
• Motivation• Past & related work• Bayesian regression for automatic outlier detection
– Batch version– Incremental version
• Results– Synthetic data– Robotic data
• Conclusions
J. Ting 21
Conclusions
• We have an algorithm that:– Automatically detects outliers in real-time– Requires no user interference, parameter tuning or
sampling– Performs on par with and even exceeds standard outlier
detection methods
• Extensions to the Kalman filter and other filters