robust strategies and model selection · 2013. 1. 12. · 1 regression model 2 least squares 3...

93
Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium [email protected] ERCIM09 - COMISEF/COST Tutorial

Upload: others

Post on 29-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust strategies and model selection

Stefan Van Aelst

Department of Applied Mathematics and Computer ScienceGhent University, Belgium

[email protected]

ERCIM09 - COMISEF/COST Tutorial

Page 2: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Outline

1 Regression model

2 Least squares

3 Manual variable selection approach

4 Automatic variable selection approach

5 Robustness

6 Robust variable selection: sequencing

7 Robust variable selection: segmentation

Robust selection procedures Stefan Van Aelst 2

Page 3: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Regression model

Regression setting

Consider a datasetZn = {(yi, xi1, . . . , xid) = (yi, xi); i = 1, . . . , n} ⊂ Rd+1.

Y is the response variable

X1, . . . ,Xd are the candidate regressors

The corresponding linear model is:

yi = β1xi1 + · · ·+ βdxid + ǫi i = 1, . . . , n

yi = x′iβ + ǫi i = 1, . . . , n

where the errors ǫi are assumed to be iid with E(ǫi) = 0and Var(ǫi) = σ2 > 0.

Estimate the regression coefficients β from the data.

Robust selection procedures Stefan Van Aelst 3

Page 4: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

Least squares solution

βLS solves minβ

n∑

i=1

(yi − x′iβ

)2

Write

X = (x1, . . . , xn)t

y = (y1, . . . , xn)t

Then, βLS solves minβ

(y − Xβ)t(y − Xβ)

βLS = (XtX)−1Xty

y = Xβ = X(XtX)−1Xty = Hy

Robust selection procedures Stefan Van Aelst 4

Page 5: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

Least squares properties

Unbiased estimator: E(βLS) = β

Gauss-Markov theorem: LS has smallest variance amongall unbiased linear estimators of β.

Why do variable selection?

Robust selection procedures Stefan Van Aelst 5

Page 6: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

Expected prediction error

Assume the true regression function is linear:Y|x = f (x) + ǫ = xtβ + ǫ

Predict the response Y0 at x0: Y0 = xt0β + ǫ0 = f (x0) + ǫ0

Use an estimator of the regression coefficients: β

Estimated prediction: f (x0) = xt0β

Expected prediction error: E[(Y0 − f (x0))

2]

Robust selection procedures Stefan Van Aelst 6

Page 7: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

Expected prediction error

E[(Y0 − f (x0))2] = E

[(f (x0) + ǫ0 − f (x0))

2]

= σ2 + E[(f (x0)− f (x0))

2]

= σ2 + MSE(f (x0))

σ2: irreducible variance of the new observation y0

MSE(f (x0)) mean squared error of the prediction at x0 bythe estimator f

Robust selection procedures Stefan Van Aelst 7

Page 8: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

MSE of a prediction

MSE(f (x0)) = E[(f (x0)− f (x0))

2]

= E[[xt

0(β − β)]2]

= E[[xt

0(β − E(β) + E(β)− β)]2]

= bias(f (x0))2 + Var(f (x0))

LS is unbiased ⇒ bias(f (x0)) = 0

LS minimizes Var(f (x0)) (Gauss-Markov)

LS has smallest MSPE among all linear unbiased estimators

Robust selection procedures Stefan Van Aelst 8

Page 9: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Least squares

LS instability

LS becomes unstable with large MSPE if Var(f (x0)) is high.This can happen if

Many noise variables among the candidate regressors

Highly correlated predictors (multicollinearity)

⇒ Improve on least squares MSPE by trading (a little) bias for(a lot of) variance!

Robust selection procedures Stefan Van Aelst 9

Page 10: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Manual variable selection approach

Manual variable selection

Try to determine the set of the most important regressors

Remove the noise regressors from the model

Avoid multicollinearity

Methods

All subsets

Backward elimination

Forward selection

Stepwise selection

→ choose a selection criterion

Robust selection procedures Stefan Van Aelst 10

Page 11: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Manual variable selection approach

Submodels

DatasetZn = {(yi, xi1, . . . , xid) = (yi, xi); i = 1, . . . , n} ⊂ Rd+1.

Let α ⊂ {1, . . . , d} denote the predictors included in asubmodel

The corresponding submodel is:

yi = x′αiβα + ǫαi i = 1, . . . , n.

A selected model is considered a good model if

It is parsimonious

It fits the data well

It yields good predictions for similar data

Robust selection procedures Stefan Van Aelst 11

Page 12: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Manual variable selection approach

Some standard selection criteria

Adjusted R2: A(α) = 1 −RSS(α)/(n − d(α))

RSS(1)/(n − 1)

Mallow’s Cp: C(α) =RSS(α)σ2 − (n − 2d(α))

Final Prediction Error: FPE(α) =RSS(α)σ2 + 2d(α)

AIC: AIC(α) = −2L(α) + 2d(α)

BIC: BIC(α) = −2L(α) + log(n)d(α)

where σ is the residual scale estimate in the "full" model

Robust selection procedures Stefan Van Aelst 12

Page 13: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Manual variable selection approach

Resampling based selection criteria

Consider the (conditional) expected prediction error:

PE(α) = E

[1n

n∑

i=1

(zi − x′αiβα

)2∣∣∣∣∣ y,X

],

Estimates of the PE can be used as selection criterion.

Estimates can be obtained by cross-validation or bootstrap.

A more advanced selection criterion takes both goodness-of-fitand PE into account:

PPE(α) =1n

n∑

i=1

(yi − x′αiβα

)2+f (n) d(α)+E

[1n

n∑

i=1

(zi − x′αiβα

)2∣∣∣∣∣ y,X

]

Robust selection procedures Stefan Van Aelst 13

Page 14: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Automatic variable selection

Try to find a stable model that fits the data well

Shrinkage: constrained least squares optimization

Stagewise forward procedures

Methods

Ridge regression

Lasso

Least Angle regression

L2 Boosting

Elastic Net

Robust selection procedures Stefan Van Aelst 14

Page 15: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Lasso

Least Absolute Shrinkage and Selection Operator

βlasso = arg minβ

n∑

i=1

yi − β0 −

d∑

j=1

βjxij

2

subject to ‖β‖1 =d∑

j=1

|βj| ≤ t

0 < t < ‖βLS‖1 is a tuning parameter

Robust selection procedures Stefan Van Aelst 15

Page 16: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Example: LASSO fits

*

*

*

* * * *

* *

2 4 6 8

−2

02

46

Df

Sta

ndar

dize

d C

oeffi

cien

ts

* *

*

* ** *

* *

* * * * * **

* *

* * * * *

* ** *

* * *

* *

* *

* *

* * * * * * *

**

* * * * * * * * ** * * * *

* *

**

LASSO

63

74

81

Robust selection procedures Stefan Van Aelst 16

Page 17: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Least angle regression

Standardize the variables.

1 Select x1 such that |cor(y, x1)| = maxj |cor(y, xj)|.

2 Put r = y − γx1 where γ is determined such that

|cor(r, x1)| = maxj6=1

|cor(r, xj)|.

3 Select x2 corresponding to the maximum above.Determine the equiangular direction b such that x′1b = x′2b

4 Put r = r − γb where γ is determined such that

|cor(r, x1)| = |cor(r, x2)| = maxj6=1,2

|cor(r, xj)|.

5 Continue the procedure . . .

Robust selection procedures Stefan Van Aelst 17

Page 18: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Properties of LAR

Least angle regression (LAR) selects the predictors inorder of importance.

LAR changes the contributions of the predictors graduallyas they are needed.

LAR is very similar to LASSO and can easily be adjustedto produce the LASSO solution

LAR only uses the means, variances and correlations ofthe variables.

LAR is computationally as efficient as LS

Robust selection procedures Stefan Van Aelst 18

Page 19: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Example: LAR fits

*

**

* * * *

* *

2 4 6 8

−0.

20.

00.

20.

40.

6

Df

Sta

ndar

dize

d C

oeffi

cien

ts

* *

*

* *

* ** *

* * * * * **

* *

* * * * *

* ** *

* * *

* *

* *

* *

* * * * * * *

**

* * * * * * * * ** * * * *

* *

**

LAR

63

74

21

Robust selection procedures Stefan Van Aelst 19

Page 20: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

L2 boosting

Standardize the variables.

1 Put r = y and F0 = 0

2 Select x1 such that |cor(r, x1)| = maxj |cor(r, xj)|.

3 Update r = y − ν f(x1) where 0 < ν ≤ 1 is the step lengthand f(x1) are the fitted values from the LS regression of yon x1.Similarly, update F1 = F0 + ν f(x1)

4 Continue the procedure . . .

Robust selection procedures Stefan Van Aelst 20

Page 21: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Automatic variable selection approach

Sequencing variables

Several selection algorithms sequence the predictors in "orderof importance" or screen out the most relevant variables

Forward/stepwise selection

Stagewise forward selection

Penalty methods

Least angle regression

L2 boosting

These methods are computationally very efficient because theyare only based on means, variances and correlations.

Robust selection procedures Stefan Van Aelst 21

Page 22: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Robustness: Data with outliers

Question: Number of partners men and women desire tohave in the next 30 years?

Men: Mean=64.3, Median=1−→ Mean is sensitive to outliers−→ Median is robust and thus more reliable

Robust selection procedures Stefan Van Aelst 22

Page 23: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Least squares regression

3.6 3.8 4.0 4.2 4.4 4.6

4.0

4.5

5.0

5.5

Log Surface Temperature

Log

Ligh

t Int

ensi

ty

LS

LS: Minimize∑

r2i (β)

Robust selection procedures Stefan Van Aelst 23

Page 24: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Outliers

3.6 3.8 4.0 4.2 4.4 4.6

4.0

4.5

5.0

5.5

6.0

Log Surface Temperature

Log

Ligh

t Int

ensi

ty LS

Outliers attract LS!

Robust selection procedures Stefan Van Aelst 24

Page 25: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Robust regression estimators

3.6 3.8 4.0 4.2 4.4 4.6

4.0

4.5

5.0

5.5

6.0

Log Surface Temperature

Log

Ligh

t Int

ensi

ty LS

MM

Robust MM estimator is less influenced by outliers!

Robust selection procedures Stefan Van Aelst 25

Page 26: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Robust univariate location estimators

The sample mean Xn satisfies the equation

n∑

i=1

(Xi − Xn) = 0

The ML estimator θ solves the equation

n∑

i=1

∂θlog fθ(Xi)|θ=θ = 0

For a suitable score function ψ(x, θ), the M-estimator Tn

solves the equation

n∑

i=1

ψ(Xi − Tn) = 0

Robust selection procedures Stefan Van Aelst 26

Page 27: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Univariate location M-estimators

n∑

i=1

ψ(Xi − Tn) = 0

Consistent if∫ψ(y)dF(y) = EF(ψ(y)) = 0

Asymptotic efficiency:(∫ψ′dΦ)2∫ψ2dΦ

Robustness: Maximal breakdown point (50%) if ψ(y) isbounded!

Robust selection procedures Stefan Van Aelst 27

Page 28: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Examples of M-estimators

Sample mean: ψ(t) = t: Unbounded! Efficiency: 100%

Median: ψ(t) = sign(t): Bounded, efficiency: 63.7%

Huber estimator: ψb(t) = min{b,max{t,−b}}

=

t if |t| ≤ b

sign(t) b if |t| ≥ bwith b > 0

Robust selection procedures Stefan Van Aelst 28

Page 29: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Huber psi function

−4 −2 0 2 4

−4

−2

02

4

x

Hub

er p

si

b−b

Robust selection procedures Stefan Van Aelst 29

Page 30: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Tuning the Huber M-estimator

Huber M-estimator has maximal breakdown point for anyb <∞→ b can be chosen for good efficiency at Φ

b = 1.37 yields 95% efficiency→ trade-off between robustness and efficiency!

Robust selection procedures Stefan Van Aelst 30

Page 31: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Example: Copper content in flour

Copper content (parts per million) in 24 wholemeal floursamples

510

1520

2530

Robust selection procedures Stefan Van Aelst 31

Page 32: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Example: Copper content in flour

Copper content (parts per million) in 24 wholemeal floursamples

Sample mean: 4.28

Sample median: 3.39

Huber M-estimator: 3.21

Robust selection procedures Stefan Van Aelst 32

Page 33: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Monotone M-estimates

Huber M-estimator has a monotone psi-function

If the function ψ(t) is monotone, then

Equation∑n

i=1 ψ(Xi − Tn) = 0 has a unique solution

Tn is easy to compute

Tn has maximal breakdown point

Large outliers still affect the estimate (although theeffect remains bounded)

Robust selection procedures Stefan Van Aelst 33

Page 34: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Redescending M-estimates

If the function ψ(t) is not monotone, but redescends to zero,then

Equation∑n

i=1 ψ(Xi − Tn) = 0 has multiple solutions

Define ρ(t) such that ρ′(t) = ψ(t), then we need the solution

minTn

n∑

i=1

ρ(Xi − Tn)

Tn can be more difficult to compute

Tn has maximal breakdown point

The effect of large outliers on the estimate reduces to zero!

Increased robustness against large outliers

Robust selection procedures Stefan Van Aelst 34

Page 35: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Redescending M-estimates

A popular family of redescending loss functions is the Tukeybiweight (bisquare) family of loss functions:

ρc(t) =

t22 − t4

2c2 +t6

6c4 if |t| ≤ c

c2

6 if |t| ≥ c.

The constant c can be tuned for efficiency

Robust selection procedures Stefan Van Aelst 35

Page 36: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Tukey biweight ρ functions

−4 −2 0 2 4

0.0

0.5

1.0

1.5

2.0

t

ρ(t)

c=3

c=2

c=∞

Robust selection procedures Stefan Van Aelst 36

Page 37: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Tukey biweight ψ function

−6 −4 −2 0 2 4 6

−4

−2

02

4

x

Psi

func

tion

b−b

c−c

HuberTukey

Robust selection procedures Stefan Van Aelst 37

Page 38: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Example: Copper content in flour

Copper content (parts per million) in 24 wholemeal floursamples

Sample mean: 4.28

Sample median: 3.39

Huber M-estimator: 3.21

Tukey biweight M-estimator: 3.16

Robust selection procedures Stefan Van Aelst 38

Page 39: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Univariate scale estimators

Example: Copper content (parts per million) in 24 wholemealflour samples

Standard deviation: 5.30

Median absolute deviation (MAD):

Sn = 1.483 med(|Xi − med(Xj)|)

MAD: 0.53

−→ Standard deviation is sensitive to outliers−→ MAD is robust and thus more reliable

Robust selection procedures Stefan Van Aelst 39

Page 40: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

M-estimators of scale

M-estimator of scale is the solution Sn such thatn∑

i=1

ψ(Xi/Sn) = 0

Symmetric distributions: use symmetric ψ functionsConsistent if

∫ψ(y)dF(y) = EF(ψ(y)) = 0

The Tukey biweight loss functions ρc are symmetricPut b = EΦ(ρc) and define ψc(t) = ρc(t)− b, then the Tukeybiweight M-estimator of scale Sn solves

n∑

i=1

ψc(Xi/Sn) = 0

or equivalently1n

n∑

i=1

ρc(Xi/Sn) = b

Robust selection procedures Stefan Van Aelst 40

Page 41: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Example: Copper content in flour

Copper content (parts per million) in 24 wholemeal floursamples

Standard deviation: 5.30

Median absolute deviation: 0.53

Tukey biweight M-estimator: 0.66

Robust selection procedures Stefan Van Aelst 41

Page 42: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Robust regression

Denote ri(β) = yi − x′iβ the residuals corresponding to β

βLS solves minβ

n∑

i=1

(yi − x′iβ

)2=

n∑

i=1

(ri(β))2

Denote σ(β) =√∑n

i=1(ri(β))2

n−d the estimate of the residualscale

The LS estimator βLS then equivalently solves minβσ(β)

⇒ Instead minimize a robust estimate of the residual scale

Robust selection procedures Stefan Van Aelst 42

Page 43: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Least Median of Squares regression

LS LMS

Minimize1

n − d

n∑

i=1

ri(β)2 −→ Minimize med ri(β)

2

Maximal breakdown point (50%)Small biasSlow rate of convergence (n−1/3)Inefficient

Robust selection procedures Stefan Van Aelst 43

Page 44: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Least Trimmed Squares regression

LS LTS

Minimize1

n − d

n∑

i=1

ri(β)2 −→ Minimize

1h

h∑

i=1

(r(β)2)i:n

where (r(β)2)1:n ≤ · · · ≤ (r(β)2)n:n

Breakdown point is min{h, n − h}/n ≤ 50%Asymptotically normalTrade-off robustness-efficiencyLow efficiency (less than 10%)

Robust selection procedures Stefan Van Aelst 44

Page 45: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Regression S-estimators

LS S-estimate

Minimize1n

n∑

i=1

ri(β)2 −→ Minimize σ(β)

For each β, σ(β) solves 1n

∑ρc

(ri(β)σ

)= b

c determines both robustness and efficiencyTrade-off robustness-efficiencyBreakdown point can be up to 50%Asymptotically normalEfficiency can still be low (less than 35%)

Robust selection procedures Stefan Van Aelst 45

Page 46: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Regression M-estimators

LS M-estimate

Minimizen∑

i=1

ri(β)2 −→ Minimize

n∑

i=1

ρ

(ri(β)

σ

)

or solven∑

i=1

ψ

(ri(β)

σ

)xi = 0

Requires a robust scale estimate σ!

Robust selection procedures Stefan Van Aelst 46

Page 47: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

MM estimates

LS MM-estimate

Minimizen∑

i=1

ri(β)2 −→ Minimize

n∑

i=1

ρ

(ri(β)

σ

)

σ is S-estimator’s M-scale

M and S-estimator both use Tukey biweight ρc functions

S-estimator is tuned for robustness (breakdown point)

Redescending M-estimator is tuned for efficiency

Robust selection procedures Stefan Van Aelst 47

Page 48: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

MM: loss functions

Tukey biweight family ρc(t) =

{3 t

c2 − 3 t

c4 + t

c6 if |t| ≤ c

1 if |t| > c,

x

loss

−7 0 c0 c1 7

0.0

0.2

0.4

0.6

0.8

1.0

1.2

ρ0ρ1

ρ0 determines the breakdown point (S-estimator)ρ1 determines the efficiency (MM-estimator)

Robust selection procedures Stefan Van Aelst 48

Page 49: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

MM estimates

LS MM-estimate

Minimizen∑

i=1

ri(β)2 −→ Minimize

n∑

i=1

ρ

(ri(β)

σ

)

σ is S-estimator’s M-scale

M and S-estimator both use Tukey biweight ρc functions

S-estimator is tuned for robustness (breakdown point)

Redescending M-estimator is tuned for efficiency

Highly robust and efficient!

Robust selection procedures Stefan Van Aelst 49

Page 50: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robustness

Redescending psi function

⋆ A redescending psi function is needed for robustness, butthis implies

Multiple solutions of score equations

Global solution is needed (high breakdown point)

Difficult (time consuming) to compute

Robust selection procedures Stefan Van Aelst 50

Page 51: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Robust variable selection

Issues

Robust regression estimators are computationallydemanding

’Outliers’ depend on the model under consideration

High dimensional data: Outlying cases?

Our approach: a two-step procedure

Sequencing: Construct a reduced sequence ofgood predictors in an efficientway.

Segmentation: Build an optimal model fromthe reduced set of predictors.

Robust selection procedures Stefan Van Aelst 51

Page 52: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Sequencing the variables in order of importance

Automatic variable selection methods such asforward/stepwise selection, LAR and L2 boosting arecomputationally efficient methods to sequence predictors

These methods are based only on the means, variancesand correlations of the data.

⇒ Construct computationally efficient, robust methods to se-quence predictors by using computationally efficient and highlyrobust estimates of center, scale and correlation

Robust selection procedures Stefan Van Aelst 52

Page 53: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Robust building blocks

Location: Median

Scatter: Median Absolute Deviation

Correlation: Bivariate Winsorization

Correlation: Bivariate M-estimators

Correlation: Gnanadesikan-Kettenring estimators

Robust selection procedures Stefan Van Aelst 53

Page 54: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Winsorized correlation estimates

1 Robustly standardize the data using median and MAD

2 Transform the data by shifting outliers towards the center

3 Calculate the Pearson correlation of the transformed data

Robust selection procedures Stefan Van Aelst 54

Page 55: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Univariate Winsorization

Componentwise transformationu = ψc(x) = min(max(−c, x), c)

−4 −2 0 2 4

−4

−2

02

4

x

Huber ψ function with c=2ψ

c(x)

Robust selection procedures Stefan Van Aelst 55

Page 56: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Univariate Winsorization

Componentwise transformationu = ψc(x) = min(max(−c, x), c)

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 56

Page 57: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bivariate Winsorization

Bivariate transformation

u = min(√

c/D(x), 1)x with c = F−1χ2

2(0.95)

D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 57

Page 58: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bivariate Winsorization

Bivariate transformation

u = min(√

c/D(x), 1)x with c = F−1χ2

2(0.95)

D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 58

Page 59: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bivariate Winsorization

Bivariate transformation

u = min(√

c/D(x), 1)x with c = F−1χ2

2(0.95)

D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 59

Page 60: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bivariate Winsorization

Bivariate transformation

u = min(√

c/D(x), 1)x with c = F−1χ2

2(0.95)

D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 60

Page 61: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Initial correlation estimate

Adjusted Winsorization: Univariate Winsorization with differenttuning constants for different quadrants.

Denote h = ratio of observations in second and fourth quadrantsto the observations in first and third quadrant.

Suppose h ≤ 1, then

Use constant c1 for Winsorizing points in first and third quadrantsUse c2 =

hc1 for second and fourth quadrants

R0 is correlation matrix of adjusted Winsorized data

Robust selection procedures Stefan Van Aelst 61

Page 62: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Initial correlation estimate

Adjusted Winsorization: Univariate Winsorization with differenttuning constants for different quadrants.

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 62

Page 63: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Initial correlation estimate

Univariate Winsorization

−4 −2 0 2

−4−3

−2−1

01

23

variable 1

varia

ble 2

Robust selection procedures Stefan Van Aelst 63

Page 64: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Correlation M-estimators

1 First center the two variables using their medians

2 An M-estimate of the covariance matrix is the solution V ofthe equation

1n

i

u2(d2i )xix′i = V,

where d2i = x′iV

−1xi and u2(t) = min(χ22(0.99)/t, 1)

3 Calculate the correlation corresponding to the bivariatecovariance matrix V

Robust selection procedures Stefan Van Aelst 64

Page 65: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Gnanadesikan-Kettenring correlation estimators

Consider the identity

cov(X, Y) =14

(sd(X + Y)2 − sd(X − Y)2)

Replace the sample standard deviations by robustestimates of scale to obtain robust correlation estimates

Robust selection procedures Stefan Van Aelst 65

Page 66: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Robust correlations: Computational efficiency

10000 20000 30000 40000 50000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

sample size

cpu

time

Uni−WinsorAdj−WinsorBi−WinsorMaronna

Robust selection procedures Stefan Van Aelst 66

Page 67: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Robust LAR: Computational efficiency

Computational efficiency of correlations largely determinescomputing time of Robust LAR

50 100 150 200 250 300

050

100

150

dimension

cpu

time

LARSW−RLARSM−RLARS

Robust selection procedures Stefan Van Aelst 67

Page 68: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bootstrapping the sequencing algorithms

Use bootstrap averages to obtain more reliable and stablesequencesProcedure:

1 Generate 50 bootstrap samples2 Sequence predictors in each sample3 Rank predictors according to their average rank over the

bootstrap samples

Not all predictors have to be ranked in each bootstrapsample

Robust selection procedures Stefan Van Aelst 68

Page 69: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bootstrap effect on robust LAR

Simulation design

Samples of size 150 in 200 dimensions

10 target predictors

20 noise covariates correlated with target predictors

170 independent noise covariates

10% of symmetric or asymmetric high leverage outliers

We compare with random forests using variableimportance measures to sequence the variables

Robust selection procedures Stefan Van Aelst 69

Page 70: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Bootstrap RLAR vs RLAR/Random Forests

Symmetric high leverage Asymmetric high leverage

0 10 20 30 40 50

24

68

10

Number of Variables

Num

ber

of T

arge

t Var

iabl

es

B−RLARSRLARSRF−OOBRF−IMP

0 10 20 30 40 50

24

68

Number of Variables

Num

ber

of T

arge

t Var

iabl

es

Robust selection procedures Stefan Van Aelst 70

Page 71: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Example: Demographic data

n = 50 states of USA, d = 25 covariates.

Response y = murder rate

One outlier

5-fold cross validation selects a model with 7 variables

We sequence the variables using B-RLARSConstruct learning curve

Graphical tool to select the size of reduced sequence inpracticeBased on a robust R2 measure:

e.g. R2 = 1 −Med(residual2)

MAD2(y)

Robust selection procedures Stefan Van Aelst 71

Page 72: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Demographic data: learning curve

Learning curve

5 10 15 20

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Number of variables in the model

Lear

ning

rat

e

⇒ Reduced set of at most 12 predictors

Robust selection procedures Stefan Van Aelst 72

Page 73: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Demographic data: models

Full CV model: 7 predictors

B-RLAR+CV: 6 predictors

LAR+CV: 8 predictors

RF-SEL: 5 predictors

RF-SEL+CV: 4 predictors

RF-RED+CV: 5 predictors

MSVM-RFE: 8 predictors

MSVM-RFE+CV: 6 predictors

Robust selection procedures Stefan Van Aelst 73

Page 74: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Demographic data: model comparison

Density estimates based on 1000 5-fold CV-MSPE estimates.

0 200 400 600 800 1000

0.00

00.

005

0.01

00.

015

5−fold CV−MSPE

dens

ity

Full−CVLARS+CVB−RLARS+CVRF−SEL+CVRF−RED+CVMSVM−RFE

Robust selection procedures Stefan Van Aelst 74

Page 75: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Example: Protein data

n = 4141 protein sequences, d = 77 covariates.

Training sample of size 2072 and test sample of size 2069.We selected predictors using

B-RLAR: 5 predictorsRF using OOB importance: 22 predictorsMSVM-RFE: 22 predictors

For RF we could determine an optimal submodel in thereduced sequences using robust MM-estimates with robustFPE. ⇒ RF+RFPE: 18 predictors

Robust selection procedures Stefan Van Aelst 75

Page 76: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Protein data: test sample errors

Trimmed means of squared prediction errors

Trimming fractionModel 1% 5% 10%B-RLAR 116.19 97.73 84.67RF 111.11 93.80 81.30RF-RFPE 111.30 93.92 81.27MSVM-RFE 173.70 150.48 133.17

Robust selection procedures Stefan Van Aelst 76

Page 77: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Example: Particle data

Quantum physics data with d = 64 predictors.

Training sample of size 5,000, test sample of size 45,000.

FS and SW produced a model with 25 predictors.

Robust FS and SW produced a model with only 1 predictor.

Indeed for more than 80% of the cases X1 = Y = 0.

For the cases with X1 6= 0, FS produced a model with 5predictors.

We fit the final models using MM-estimators.

Robust selection procedures Stefan Van Aelst 77

Page 78: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: sequencing

Particle data: test sample errors

Trimmed means of squared prediction errors

Trimming fractionModel 1% 5%

FS 0.110 0.012Robust FS 0.032 0.001

Robust selection procedures Stefan Van Aelst 78

Page 79: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Segmentation: Robust adjusted R-squared

Adjusted R2: A(α) = 1 −RSS(α)/(n − d(α))

RSS(1)/(n − 1)Based on a robust regression estimator we can construct arobust adjusted R2:

RR2a(α) = 1 −

σ2α

(n − d(α))

/σ2

0

(n − 1),

σα is the robust residual scale of the submodel withpredictor indexed by α

σ0 is the robust residual scale of the intercept-onlymodel

Robust selection procedures Stefan Van Aelst 79

Page 80: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Segmentation: Robust FPE

FPE(α) = RSS(α)σ2 + 2d(α) estimates the final prediction error

FPE(α) =1σ2

n∑

i=1

E[(zi − x′αiβα)

2], assuming that the

model is correct.

Consider now the robust final prediction error:

RFPE(α) =n∑

i=1

E

(zi − x′αiβα

σ

)]. Assuming that the

model is correct and using a second order Taylorexpansion, this can be estimated by

RFPE(α) =∑n

i=1 ρ(ri(βα)/σn) + d(α)∑n

i=1 ψ2(ri(βα

)/σn)∑n

i=1 ψ′(ri(βα

)/σn)

σn is the robust scale estimate of a ’full’ model αf . Usually,αf = {1, . . . , d}

Robust selection procedures Stefan Van Aelst 80

Page 81: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Robust resampling based selection criteria

Robust equivalents of the resampling based selection criteria:

RPE(α) =σ2

n

nE⋆

[n∑

i=1

ρ

(zi − x′αiβα

σn

)∣∣∣∣∣ y,X]

PRPE(α) =σ2

n

n

{n∑

i=1

ρ

(yi − x′αiβα

σn

)+ f (n) d(α)

}+ Mn(α)

ρ is the MM loss function and βα,n is the MM estimate

f (n)d(α) is the penalty term with e.g. f (n) = 2 log n

σn is the robust scale estimate of a ’full’ model αf . Usually,αf = {1, . . . , d}

E⋆ is a robust resampling estimate of the expected value

Robust selection procedures Stefan Van Aelst 81

Page 82: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Robustness and resampling

Resampling robust estimators causes problems with

robustness

speed

Stratified bootstrap (Müller and Welsh, JASA, 2005) onlysolves the first problem.−→ Limited practical use.

The fast and robust bootstrap solves both problems.

Robust selection procedures Stefan Van Aelst 82

Page 83: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

MM-estimators revisited

For the model comparison we use slightly adjustedMM-estimators:The MM-estimates βα satisfy

1n

n∑

i=1

ψ1

(yi − x′αiβα

σn

)xαi = 0 ,

where σn minimizes the M-scale σn(β), which for any β ∈ Rd isdefined as the solution that satisfies

1n

n∑

i=1

ρ0

(yi − x′iβσn(β)

)= b

ρ0 determines the breakdown point (S-estimator)

ρ1 determines the efficiency (MM-estimator)

Robust selection procedures Stefan Van Aelst 83

Page 84: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Bootstrapping MM-estimates

Weighted least squares representation of MM-estimator

βα,n =

[n∑

i=1

ωαi xαi x′αi

]−1 n∑

i=1

ωαi xαi yi

with ωαi = ρ′1(rαi/σn)/rαi and rαi = yi − βα,n′xαi

Let (y⋆i , x⋆αi), i = 1, . . . ,m be a bootstrap sample of size

m ≤ n. Then β⋆

α satisfies

β⋆

α,m =

[m∑

i=1

ω⋆αi x⋆αi x⋆′

αi

]−1 m∑

i=1

ω⋆αi x⋆αi y⋆i

with ω⋆αi = ρ′1(r⋆αi/σ

⋆n)/r⋆αi and r⋆αi = y⋆i − β

α,m′x⋆αi

Robust selection procedures Stefan Van Aelst 84

Page 85: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Fast and robust bootstrap

Weighted least squares representation of MM-estimator

βα,n =

[n∑

i=1

ωαi xαi x′αi

]−1 n∑

i=1

ωαi xαi yi

with ωαi = ρ′1(rαi/σn)/rαi and rαi = yi − βα,n′xαi

Let (y⋆i , x⋆αi), i = 1, . . . ,m be a bootstrap sample of size

m ≤ n. Define β1,⋆α by

β1,⋆α,m =

[m∑

i=1

ω⋆αi x⋆αi x⋆′

αi

]−1 m∑

i=1

ω⋆αi x⋆αi y⋆i

with ω⋆αi = ρ′1(r⋆αi/σn)/r⋆αi and r⋆αi = y⋆i − βα,n

′x⋆αi

Note that βα,n and σn are not recalculated!

Robust selection procedures Stefan Van Aelst 85

Page 86: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Fast and robust bootstrap

The estimates β1,⋆α,m will under-estimate the variability of the

completely recalculated estimates β⋆

α,m→ A correction is needed

The fast and robust bootstrap estimates βR∗α,m are given by

βR∗α,m = βα,n + Kα,n

1,⋆α,m − βα,n

)

where

Kα,n = σn

[n∑

i=1

ρ′′1 (rαi/ σn) xαi x′αi

]−1 n∑

i=1

ωαixαi x′αi

Note that Kα,n is computed only once for the originalsample.

Robust selection procedures Stefan Van Aelst 86

Page 87: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Properties of fast and robust bootstrap

Computationally efficient: weighted least squarescalculations

Robust: No recalculation of observation weights

Robust selection procedures Stefan Van Aelst 87

Page 88: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Consistent model selection

Suppose a true model α0 ⊂ {1, . . . , d} exists and is included inthe set A of models considered.

If we select the model that minimizes RPE(α) or PRPE(α), thatis

αm n = argminα∈ARPE(α) and αm n = argminα∈APRPE(α),

then, under appropriate regularity conditions, the modelselection criteria are consistent in the sense that

limn→∞

P(αm,n = α0) = 1 and limn→∞

P(αm,n = α0) = 1 .

Two conditions have practical consequences

m = o(n) (m out of n bootstrap)

f (n) = o(n/m)

Robust selection procedures Stefan Van Aelst 88

Page 89: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Consistent model selection

Suppose a true model α0 ⊂ {1, . . . , d} exists and is included inthe set A of models considered.

If we select the model that minimizes RPE(α) or PRPE(α), thatis

αm n = argminα∈ARPE(α) and αm n = argminα∈APRPE(α),

then, under appropriate regularity conditions, the modelselection criteria are consistent in the sense that

limn→∞

P(αm,n = α0) = 1 and limn→∞

P(αm,n = α0) = 1 .

Two conditions have practical consequences

m = o(n) (m out of n bootstrap)

f (n) = o(n/m)

Robust selection procedures Stefan Van Aelst 89

Page 90: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Examples

We compare the full model with models selected bybackward elimination based on

RPE(α)PRPE(α) with f (n) = log(n)RFPE

For each of the models we report RR2a(α), the adjusted

robust R2

To compare predictive power we calculated the 5-fold CVtrimmed MSPE

Robust selection procedures Stefan Van Aelst 90

Page 91: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Example 1: Ozone data

Los Angeles Ozone Pollution Data, 1976366 observations (different days) on 9 variablesResponse: temperature (degrees F) at El Monte, CACovariates: Measurements of temperature, pressure,humidity, ozone, etc at other places in CA.We start from the full quadratic model (d = 45)

model size RR2a 5% Trimmed MSPE

Full 45 0.8660 10.78

RFPE 23 0.8174 10.66

αm,n 10 0.7583 11.67

αm,n 7 0.7643 10.45

Robust selection procedures Stefan Van Aelst 91

Page 92: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

Robust variable selection: segmentation

Example 2: Diabetes data

442 observations on 16 variables.Response: Measure of disease progression one year afterbaselineCovariates: 10 baseline variables (age, sex, BMI, bloodpressure, ...)We start from a quadratic model with some interactions(d = 65)

model size RR2a 5% Trimmed MSE

Full 65 0.7731 4988.1

RFPE 16 0.6045 2231.2

αm,n 11 0.5127 2657.2

αm,n 7 0.5302 2497.0

Robust selection procedures Stefan Van Aelst 92

Page 93: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness

References

◮ Khan, J.A., Van Aelst, S., and Zamar, R.H. (2007).Building a Robust Linear Model with Forward Selection andStepwise Procedures.Computational Statistics and Data Analysis, 52, 239-248.

◮ Khan, J.A., Van Aelst, S., and Zamar, R.H. (2007).Robust Linear Model Selection Based on Least AngleRegression.Journal of the American Statistical Association, 102,1289-1299.

◮ Lutz, R.W., Kalisch, M., and Bühlmann, P. (2008).Robustified L2 boosting.Computational Statistics and Data Analysis, 52, 3331-3341.

◮ Maronna, R. A., Martin, D. R. and Yohai, V. J. (2006).Robust Statistics: Theory and Methods,Wiley: New York.

◮ Salibian-Barrera, M. and Van Aelst S. (2007).Robust Model Selection Using Fast and Robust Bootstrap.Computational Statistics and Data Analysis, 52, 5121-5135

Robust selection procedures Stefan Van Aelst 93