![Page 1: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/1.jpg)
Outlier detection and accommodation for business Outlier detection and accommodation for business surveys utilizing multiple linear regression models in surveys utilizing multiple linear regression models in
edit and imputationedit and imputation
Robert PhilipsRobert Philips
ICES-IIIICES-IIIJune 21June 21stst, 2007, 2007
![Page 2: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/2.jpg)
Presentation OutlinePresentation Outline
E & I for the Monthly Wholesale Retail E & I for the Monthly Wholesale Retail Trade Survey (MWRTS)Trade Survey (MWRTS)
Outlier Model and TheoryOutlier Model and Theory
Illustrative ExampleIllustrative Example
Outlier Procedure for “large” imputation Outlier Procedure for “large” imputation cellscells
Simulation resultsSimulation results
ConclusionConclusion
![Page 3: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/3.jpg)
E & I for the MWRTSE & I for the MWRTS
Statistical edits are run prior to imputation and in part identify which of the respondent data will be used to impute non-respondents.
Statistical editing is done at the industrial grouping by geography level; if not enough units then collapse over geography.
Hidiroglou - Berthelot method (1986) used in conjunction with monthly, yearly and administrative data trend edits.
![Page 4: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/4.jpg)
E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)
;2/1 εσβy WX
ii
titititi w
yyyyI 1,
12,21,1,)(
In general for most E & I classes in the MWRTS the model is of the following form:
ii
mtimtiti w
yyyII
,,3,)(
![Page 5: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/5.jpg)
E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)
The imputation classes are at a finer level of detail than the statistical edit groupings.The principal method for imputation is the bivariate model (60%) and respondents who have passed the univariate statistical edits might actually be considered as outliers during the imputation process.There is clearly a need for an outlier detection routine for the imputation module.
![Page 6: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/6.jpg)
Outlier Model and TheoryOutlier Model and Theory
kmwxy
iiiM
mmmmmm iiiitii
ki
,,1,
:),,( 1)(
forβ
modeltherepresent)(withLet
etcomittingofvectortheisy
εβy
satisfynsobservatioofmajoritythewhile
kiiii
iii
yysykn
WX
,,'1)(
,
1)(
2/1)()()(
![Page 7: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/7.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
nmww
UW mmmmmm
m
iiiiii
i ,,1),2
,2
(~,|
likelyequallyismodeleach
,)(,
1)|( )(
nki Si
k
nkMP
The priors for the parameters are:
.,,
,,1
),(),2
1(~
1
22
2
someand
ββPoisson
iid
RRpk
nii
p
![Page 8: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/8.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
.),,()(
)(
,)(k
Q,2
)(
2/1|1)(
|
2/1|1|)(
1regressionthefromomittingaftertheiswhere
Let
kii yySSEi
S
nkSi
iq
pn
(i)S
S
(i)X
iWt
(i)X
XWtXi
q
.,k
Q
!
)1.0(),,|(*
,max
,,1,0
k
nk
kCWXk
kp
kk
youtliersProb
For
![Page 9: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/9.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
,*),,|(* max
0kpkWXkEk
k
k
y
isoutliersofnumbertheofestimateposteriorThe
.
outlierslikelymostthedetermines
attainediswhereindicesofsetThe
youtlierstheareProband
.,,
)(max),,(
).(k
Q)(
),,,|,,(
*
1
1
*
1
k
k
ii
k
ii
yy
ipii
ipi
qWXkyy
![Page 10: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/10.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
nkSi
iik
k
iiti
i
iitiii
tii
pniiii
pp
XWXpn
S
yWXXWX
tpWXMk
)()()(
**
1)(
1)()(
)(
)(1)()(
1)(
1)()()(
)()()()()(
ˆ
)(2
)(ˆ
~,,,,|
ββ
isβofestimateposteriorthe
varianceand
β
meanwith
variateyβ
![Page 11: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/11.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
k Siii
ti
iik
Si
tiiii
ti
iik
t
kkk
nk
nk
XWXpn
Spp
XWXpn
SpD
DpWXV
)(
1)(
1)()(
)()(
*2*
2
)()()(
1)(
1)()(
)()(
***
)()2(
ˆˆ)()2(
.),,|(
,
isofestimateposteriortheSimilarly
ββand
ββyβ
βofvarianceposteriorThe
![Page 12: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/12.jpg)
An illustrative example
).10,0(05.0)1,0(95.0~
25,,1,2595.030
,200~
2NN
ixy
x
i
iii
i
normaledcontaminatafromwereerrorstheand
Exp
![Page 13: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/13.jpg)
Plot of Simulated data
0
200
400
600
800
1000
1200
0 100 200 300 400 500 600
obs 19
obs 10
obs 25
Y
X
![Page 14: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/14.jpg)
Example cont…
0 0.0000
1 0.3168
2 0.3652
3 0.2143
4 0.1037
k*kp
The posterior estimate of the number of outliers is 2.105. With estimates of 34.755 and 0.967 for the intercept and slope.
![Page 15: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/15.jpg)
Example cont…
The MM (Yohai 1987) M-estimator (high breakdown) indicates that observation 25 is an outlier with high leverage and observation 10 is just of high leverage.
The estimates for the parameters are intercept=35.465 and slope= 0.9629.
![Page 16: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/16.jpg)
Example cont…
i_1 i_2 p_i Intercept Slope19 25 0.713 33.492 0.96516 25 0.036 37.393 0.9731 25 0.029 40.039 0.95815 25 0.021 37.090 0.9556 25 0.019 31.667 0.98322 25 0.015 37.280 0.9704 25 0.014 37.524 0.9685 25 0.013 38.032 0.96513 25 0.013 32.614 0.98010 25 0.012 33.747 0.987
)!25,10(6025,19 thanlikelymoretimesare
thenoutlierspossibletwoonlyarethereIf
![Page 17: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/17.jpg)
Plot of Simulated data
0
200
400
600
800
1000
1200
0 100 200 300 400 500 600
obs 19
obs 10
obs 25
Y
X
![Page 18: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/18.jpg)
Outlier Model and Theory cont…Outlier Model and Theory cont…
Strengths: method works well in detecting outliers and estimating the relevant parameters robustly. All of the data is used.
Drawback: method becomes impractical as the imputation class size n increases, since the number of possible subsets of size k will become astronomically large.
![Page 19: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/19.jpg)
Outlier Procedure for “large” Outlier Procedure for “large” imputation cellsimputation cells
outliers.theidentifytoleveragetheand
usingoriftestpasseachonlargen
mmiim hipp
k
)(,
10*0
ident hii pi P_0 b0 b1 outlier25 0.1849 1.0000 0.0000 35.6930 0.9708 Y19 0.0443 0.6768 0.6676 35.2458 0.9695 Y16 0.0443 0.1338 0.8207 33.5584 0.9651 N
.0655.25.0343.0
5023.8*
10
isestimateThelyrespectiveand
areandforerrorsstandardThe
![Page 20: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/20.jpg)
Simulation resultsSimulation results
Data from MRTS was selected where the number of respondents for the bivariate imputation model > 50 for 3 imputation classes.
For a given simulation (1-p)% in each cell were selected to impute for the remaining units.
The method presented here was compared to the MM M-estimator using the relative difference of the average predictions.
![Page 21: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/21.jpg)
Simulation results for 200 runsSimulation results for 200 runs
p% Method Cell 1 Cell 2 Cell 3
5 MRTS -0.002 -0.020 0.013
5 MM -0.008 -0.028 0.001
10 MRTS -0.000 -0.007 0.015
10 MM -0.008 -0.015 0.003
15 MRTS -0.000 -0.011 0.014
15 MM -0.008 -0.016 0.002
![Page 22: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/22.jpg)
Conclusions
The procedure for outlier detection works well and produces fairly robust estimates. It would also allow for more covariates to be included in the E&I process.Even though the assumption of normality led to the closed form solution of the estimator it is still applicable to situations where modest departures from normality arise.
![Page 23: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June](https://reader036.vdocuments.mx/reader036/viewer/2022062309/5697bfd51a28abf838cad423/html5/thumbnails/23.jpg)
For more Pour plus
information d’information, please contact veuillez contacter
www.statcan.ca
Robert Philips- e-mail: [email protected] telephone: (613) 951-1493
Merci!