estimation and accuracy after model selection
TRANSCRIPT
![Page 1: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/1.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Estimation and Accuracy after Model Selection byBradley Efron (Stanford)
Sahir Rai Bhatnagar
McGill University
April 7, 2014
1 / 43
![Page 2: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/2.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants
B.S., Mathematics Caltech(1960)
Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon
Professor of Statistics atStanford for the past 50years
2 / 43
![Page 3: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/3.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Born in St. Paul, Minnesotain 1938 to Jewish-Russianimmigrants
B.S., Mathematics Caltech(1960)
Ph.D., Statistics (1964)under the direction ofRupert Miller and HerbSolomon
Professor of Statistics atStanford for the past 50years
2 / 43
![Page 4: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/4.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
Best known for theBootstrap, Annals ofStatistics (1977)
Founding Editor Annals ofApplied Statistics
Awarded Guy Medal in Goldfrom RSS (2014) (34awarded since 1892 includingRao, Cox, Fisher, Nelder)
3 / 43
![Page 5: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/5.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
National Medal of Science 2005
Established by Congress in 1959 and administered by the NationalScience Foundation, the medal is the nation’s highest scientifichonour
4 / 43
![Page 6: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/6.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
![Page 7: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/7.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
![Page 8: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/8.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
Who?AchievementsSome Quotes
“Statistics is the science of informationgathering, especially when the informationarrives in little pieces instead of big ones”
“Statistics did not come naturally to me.Dads keeping score for the baseball leaguehelped a lot”
“I spent the first year at Stanford in theMath Department...After, I started takingstats courses, which I thought would beeasy. In fact I found them harder”
5 / 43
![Page 9: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/9.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
6 / 43
![Page 10: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/10.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
7 / 43
![Page 11: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/11.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
![Page 12: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/12.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
![Page 13: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/13.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
![Page 14: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/14.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
![Page 15: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/15.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M2p submodelslinear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
PredictionConfidence Intervals
Today’s Question: Should we care about the variability of thevariable selection step in our post-selection inference?
8 / 43
![Page 16: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/16.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example
n = 164 men took Cholestyramine (meant to reducecholesterol in the blood) for 7 years
x : a compliance measure adjusted : x ∼ N (0, 1)
y : cholesterol decrease
Perform a regression of y on x
We want to predict cholesterol decrease for a givencompliance value
µ = E [y |x ]
9 / 43
![Page 17: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/17.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example
Multiple Linear Regression Model
Y = Xβ + ε, εi ∼ N (0, σ2)
6 candidate models: M = {linear , quadratic, . . . , sextic , } e.g.
y = β0 + β1x + β2x2 + . . .+ β6x6 + ε
Cp Criterion for Model Selection
Cp(M) =SSres(M)
n︸ ︷︷ ︸goodness of fit
+2σ2pM
n︸ ︷︷ ︸complexity
Use OLS estimate for β from chosen model and predict:
µ = Xβ
10 / 43
![Page 18: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/18.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
An Example: Nonparametric Bootstrap Analysis
Bootstrap the data:
data∗ = {(xj , yj)∗, j = 1, . . . , n}
where (xj , yj)∗ are drawn randomly with replacement from the
original data
data∗ →Cp
M∗ →OLS
β∗M∗ → µ∗ = XM∗ β∗M∗
Repeat B = 4000 times
11 / 43
![Page 19: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/19.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Reproduced from Efron 201312 / 43
![Page 20: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/20.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Reproduced from Efron 2013
13 / 43
![Page 21: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/21.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Prostate Data
Examine relation between level of PSA and clinical measures
n = 97 men who were about to receive prostatectomy
x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N (0, 1))
y = logPSA
Perform regression of y on x
8 candidate models were identified using regsubsets andnbest=1
We want to estimate
µj = E [y |xj ] , j = 1, . . . , 97
14 / 43
![Page 22: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/22.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
original estimate = 3.6 based on Cp chosen model
0
100
200
2 3 4
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
of the Cp chosen model; 60% of the replications greater than the original estimate 3.6
15 / 43
![Page 23: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/23.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
original estimate = 3.6
based on Cp chosen model
18%
22%
24%
0
30
60
90
120
3 4
fitted value µ95
coun
t
model
m3
m5
m7
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications separated by three most frequently chosen models by Cp
16 / 43
![Page 24: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/24.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
1% 18% 12% 22% 15% 24% 8%
Model 7
******
3
4
m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 nonparametric bootsrap samples
17 / 43
![Page 25: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/25.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Questions
Are you convinced there is a problem in the way we dopost-selection inference?
Is the juice worth the squeeze ?
18 / 43
![Page 26: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/26.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
A Quick Review of the BootstrapTypical Model Selection SettingCholesterol Data ExampleProstate Data Example
Questions
Are you convinced there is a problem in the way we dopost-selection inference?
Is the juice worth the squeeze ?
18 / 43
![Page 27: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/27.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Bagging (Breiman 1996)
Replace original estimator µ = t(y) with bootstrap average
µ = s(y) =1
B
B∑i=1
t(y∗i )
y∗i : i th bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”
19 / 43
![Page 28: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/28.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Bagging (Breiman 1996)
Replace original estimator µ = t(y) with bootstrap average
µ = s(y) =1
B
B∑i=1
t(y∗i )
y∗i : i th bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes inthe predictor constructed, then bagging can improve accuracy”
19 / 43
![Page 29: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/29.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
![Page 30: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/30.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
![Page 31: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/31.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
![Page 32: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/32.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
![Page 33: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/33.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
t∗i = t(y∗i ), i = 1, . . . ,B (value of statistic in boot sample i)
Y ∗ij =# of times j th data point appears in i th boot sample
covj = cov(Y ∗ij , t∗i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1B∑i=1
t(y∗i ) is
sd =
n∑j=1
cov2j
1/2
20 / 43
![Page 34: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/34.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Main Contribution of this Paper
Note that covj = cov(Y ∗ij , t∗i ) is an unknown quantity. Therefore
we must estimate it. The estimate of standard deviation forµ = s(y) in the non-ideal case is
sdB =
n∑j=1
cov2j
1/2
cov j = B−1B∑i=1
(Y ∗ij − Y ∗·j
)(t∗i − t∗· )
Y ∗·j = B−1B∑i=1
Y ∗ij t∗· = B−1B∑i=1
t∗i
21 / 43
![Page 35: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/35.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Improvement on Traditional Standard Error
sdB =
n∑j=1
cov2j
1/2
is always less than the bootstrap estimate of standard deviation forthe unsmoothed statistic
sdB =
n∑j=1
(t∗i − t∗· )2
1/2
22 / 43
![Page 36: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/36.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
![Page 37: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/37.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
![Page 38: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/38.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
IdeaStandard ErrorsTheoremConfidence Intervals
Three Types
1 Standardµ± 1.96sdB
2 Percentile [µ∗(0.025), µ∗(0.975)
]
3 Smoothedµ± 1.96sdB
23 / 43
![Page 39: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/39.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
L1-Norm Penalty Functions
Recall the optimization problem of interest:
maxβ
`n(β)− n
p∑j=1
p(|βj |;λ)
24 / 43
![Page 40: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/40.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
LASSO, SCAD and MCP penalties
LASSO (Tibshirani, 1996)
p(|β|;λ) = λ|β|
SCAD (Fan and Li, 2001 )
p′(|β|;λ, γ) = λsign(β)
{I(|β|≤λ) +
(γλ− |β|)+(γ − 1)λ
I(|β|>λ)
}, γ > 2
MCP (Zhang, 2010)
p(|β|;λ, γ) =
{λ|β| − |β|
2
2γ |β| ≤ γλγλ2
2 |β| > γλ
25 / 43
![Page 41: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/41.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Software
Analysis was performed in R
Implement LASSO using the glmnet package (Friedman,Hastie, Tibshirani, 2013)
SCAD and MCP using the coordinate descent algorithm(Breheny and Huang, 2011) in the ncvreg package
BIC and Cp model selection using the leaps package(Lumley, 2009)
26 / 43
![Page 42: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/42.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
MCP SCAD LASSO
0
50
100
150
200
2 3 4 2 3 4 2 3 4
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
27 / 43
![Page 43: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/43.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
BIC Cp
0
100
200
300
−10 0 10 −10 0 10
fitted value µ95
coun
tFitted values for subject 95, from B=4000 nonparametric bootsrap replications
28 / 43
![Page 44: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/44.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
SCAD, MCP, LASSO
LASSO
SCAD
MCP
0.0 0.5 1.0length
pena
lty
type
standard
quantile
smooth
95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties
29 / 43
![Page 45: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/45.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
BIC and Cp
BIC
Cp
0 5 10 15 20length
pena
lty
type
standard
quantile
smooth
Length of 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for Cp and BIC
30 / 43
![Page 46: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/46.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Table : Prostate data, B=4000, Observation 95
model type fitted value sd length coverage
LASSO standard 3.62 0.31 1.21 0.94quantile 1.20 0.95smooth 3.57 0.29 1.14 0.93
SCAD standard 3.60 0.35 1.37 0.95quantile 1.33 0.95smooth 3.62 0.33 1.28 0.93
MCP standard 3.60 0.35 1.38 0.96quantile 1.35 0.95smooth 3.61 0.33 1.29 0.94
BIC standard 5.50 4.75 18.62 0.84quantile 16.05 0.95smooth 3.22 3.46 13.55 0.83
Cp standard 5.13 5.11 20.02 0.86quantile 16.15 0.95smooth 0.64 4.40 17.24 0.97 31 / 43
![Page 47: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/47.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
An Example: Parametric Bootstrap Analysis
Obtain OLS estimates µOLS based on full model
Generatey∗ ∼ N (µOLS , I)
Full Model Bootstrap
y∗ →Cp
M∗, β∗M∗ → µ∗ = XM∗ β∗M∗
Repeat B = 4000 times → t∗ij = µ∗ijSmoothed Estimates
sj = B−1B∑i=1
t∗ij
32 / 43
![Page 48: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/48.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.6
based on Cp chosen model
0
50
100
150
200
1 2 3 4 5
fitted value µ95
coun
tFitted values for subject 95, from B=4000 Parametric bootsrap replications
of the Cp chosen model; 53% of the replications greater than the original estimate 3.6
33 / 43
![Page 49: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/49.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.6
based on Cp chosen model
0
10
20
30
1 2 3 4 5
fitted value µ95
coun
t
model
m6
m7
m8
Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by Cp
34 / 43
![Page 50: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/50.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●3% 6% 16% 12% 13% 14% 17% 19%
Model 8
******
1
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 Parametric bootsrap samples
35 / 43
![Page 51: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/51.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.7
based on BIC chosen model
0
50
100
150
200
2 3 4 5
fitted value µ95
coun
tFitted values for subject 95, from B=4000 Parametric bootsrap replications
of the BIC chosen model; 40% of the replications greater than the original estimate 3.7
36 / 43
![Page 52: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/52.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
original estimate = 3.7
based on BIC chosen model
27%
20%
18%
0
20
40
60
80
3 4 5
fitted value µ95
coun
t
model
m1
m2
m3
Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by BIC
37 / 43
![Page 53: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/53.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
20% 18% 27% 13% 9% 5% 5% 3%
Model 3
******
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8model
fitted
value
µ 95
Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria based on B=4000 Parametric bootsrap samples
38 / 43
![Page 54: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/54.jpg)
Bradley EfronMotivation
Bootstrap SmoothingResults
SettingProstate Data: RevisitedParametric BootstrapDiscussion
Improvements for regularized procedures where tuningparameters are also chosen in a data-driven fashion
GLM ?
Why parametric bootstrap?
39 / 43
![Page 55: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/55.jpg)
Family
![Page 56: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/56.jpg)
Roots
![Page 57: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/57.jpg)
What I have done so far
1 BSc Actuarial Math - Concordia (2005-2008)
2 Pension actuary (2008-2011)
3 RA at the Chest with Andrea Benedetti (2011-2012)
4 MSc Biostats - Queen’s (2012-2013)
![Page 58: Estimation and Accuracy after Model Selection](https://reader030.vdocuments.mx/reader030/viewer/2022020218/55a83d0f1a28ab874f8b459a/html5/thumbnails/58.jpg)
What’s Next?
1 PhD Biostatistics - McGill (2013-???)
2 Supervisor Celia Greenwood (Statistical Genetics)