unit 6: the basics of multiple regression class 14… class 15…class 14…class 15…...

47
Unit 6: The basics of multiple regression Class 14… Class 15… ttp://xkcd.com/314 / Unit 6 / Page 1 © Andrew Ho, Harvard Graduate School of Education

Upload: mary-newton

Post on 17-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Unit 6: The basics of multiple regression Class 14… Class 15…

http://xkcd.com/314/Unit 6 / Page 1

Page 2: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Where is Unit 6 in our 11-Unit Sequence?

Unit 6:The basics of

multiple regression

Unit 7:Statistical control in depth:Correlation and collinearity

Unit 10:Interaction and quadratic effects

Unit 8:Categorical predictors I:

Dichotomies

Unit 9:Categorical predictors II:

Polychotomies

Unit 11:Regression in practice. Common Extensions.

Unit 1:Introduction to

simple linear regression

Unit 2:Correlation

and causality

Unit 3:Inference for the regression model

Building a solid

foundation

Unit 4:Regression assumptions:Evaluating their tenability

Unit 5:Transformations

to achieve linearity

Mastering the

subtleties

Adding additional predictors

Generalizing to other types of predictors and

effects

Pulling it all

together

Unit 6 / Page 2

Page 3: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

In this unit, we’re going to cover…

• Various representations of the multiple regression model:– An algebraic representation – A three dimensional graphic representation– A two dimensional graphic representation

• Multiple regression—how it works and helps improve predictions– Estimating the parameters of the multiple regression model– Holding predictors constant—what does this really mean?

• Plotting the fitted multiple regression model: – Deciding how to construct the plot– Choosing prototypical values – Learning how to actually construct the plot (and interpret it correctly!)

• and the Analysis of Variance (ANOVA) in multiple regression• Inference in multiple regression

– The omnibus -test in multiple regression– Individual -tests

• How might we summarize MR results in both tables and figures?Unit 6 / Page 3

Page 4: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

US News and World Report education school rankings from 2006

Unit 6 / Page 4

How do student characteristics like GRE scores and size of the doctoral class predict peer ratings of Ed Schools? Do schools gain in reputation for graduating large numbers of high-achieving students?

http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-education-schools/edu-rankings

Page 5: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

As always, our starting point

Unit 6 / Page 5

docgrad 87 45.67816 33.03293 4 193 gre 87 557.8966 42.8993 474.5 677.5 peerrate 87 344.8276 45.1319 280 470 Variable Obs Mean Std. Dev. Min Max

. su peerrate gre docgrad

05

1015

20F

req

uenc

y

300 350 400 450Mean peer rating by Deans (100-500)

010

2030

Fre

que

ncy

0 50 100 150 200Number of doctoral degrees granted in 2004

05

1015

2025

Fre

que

ncy

450 500 550 600 650Average mean verbal and quantitative GRE scores

05

1015

20F

req

uenc

y

1 2 3 4 5log(Number of doctoral degrees granted in 2004)

31. Temple 320 542.5 76 30. Kansas 350 535 64 29. Uconn 340 589 52 28. ChapelHill 390 566 33 27. Georgia 380 549 141 26. Iowa 360 600.5 65 25. Florida 360 591 51 24. GWU 340 558.5 71 23. Uva 400 576.5 96 22. Maryland 400 566 66 21. OhioState 400 541 61 20. BC 360 584.5 42 19. USC 360 569.5 119 18. Urbana 410 633 50 17. Washington 370 593 37 16. UTAustin 400 586.5 102 15. Indiana 390 596 110 14. MichiganState 420 586.5 52 13. Oregon 340 611.5 39 12. MinneTC 390 575 89 11. NYU 360 596 112 10. Madison 430 580 106 9. Michigan 430 609 38 8. Penn 380 604 61 7. Berkeley 440 605 43 6. Northwestern 390 677 10 5. Vanderbilt 430 660.5 22 4. TC 440 604.5 193 3. Stanford 470 677.5 38 2. UCLA 410 578 53 1. Harvard 450 662.5 60 school peerrate gre docgrad

. list school peerrate gre docgrad, clean

Page 6: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Mean peerrating byDeans

(100-500)

Average meanverbal andquantitativeGRE scores

Number ofdoctoraldegrees

granted in2004

300

400

500

300 400 500

500

600

700

500 600 700

0

100

200

0 100 200

Scatterplot matrix: graph matrix

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 6

. graph matrix peerrate gre docgrad

Page 7: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

-100

-50

050

100

Res

idua

ls o

f pee

rrat

e on

gre

300 350 400 450Fitted values

-50

050

100

150

Res

idua

ls o

f pee

rrat

e on

doc

grad

300 350 400 450Fitted values

Use graph combine to be more succinct.

Unit 6 / Page 7

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

0 50 100 150 200Number of doctoral degrees granted in 2004

. graph combine peer_gre peer_doc, cols(1)

. scatter peerrate docgrad || lfit peerrate docgrad, legend(off) ytitle("Mean peer rating by Deans (100-500)") name(peer_doc, replace)

. scatter peerrate gre || lfit peerrate gre, legend(off) ytitle("Mean peer rating by Deans (100-500)") name(peer_gre, replace)

. graph combine peer_gre peer_doc, cols(1) fxsize(60)

. graph combine rvf_gre rvf_doc, cols(1) fxsize(60)

. rvfplot, yline(0) ytitle(Residuals of peerrate on docgrad) name(rvf_doc, replace)

. rvfplot, yline(0) ytitle (Residuals of peerrate on gre) name(rvf_gre, replace)

Given the variable (enrollment) and the shape of these plots, a log transformation is worth exploring.

Page 8: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

50 100 150 200Number of doctoral degrees granted in 2004 (log scale)

-100

-50

05

01

001

50R

esid

ual

s o

f pe

erra

te o

n lo

g(d

ocg

rad

)

250 300 350 400Fitted values

-50

05

01

001

50R

esid

ual

s o

f pe

erra

te o

n do

cgra

d

300 350 400 450Fitted values

Logarithmic transformation of the docgrad variable

Unit 6 / Page 8

_cons 313.4533 7.207113 43.49 0.000 299.1236 327.783 docgrad .6868552 .1281049 5.36 0.000 .4321483 .9415621 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 39.243 Adj R-squared = 0.2439 Residual 130901.042 85 1540.01226 R-squared = 0.2527 Model 44271.3714 1 44271.3714 Prob > F = 0.0000 F( 1, 85) = 28.75 Source SS df MS Number of obs = 87

. regress peerrate docgrad

_cons 247.6276 20.588 12.03 0.000 206.6931 288.5621 logdoc 27.27396 5.648818 4.83 0.000 16.04259 38.50532 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 40.216 Adj R-squared = 0.2060 Residual 137469.929 85 1617.29328 R-squared = 0.2152 Model 37702.4852 1 37702.4852 Prob > F = 0.0000 F( 1, 85) = 23.31 Source SS df MS Number of obs = 87

. regress peerrate logdoc

. label variable logdoc "log(Number of doctoral degrees granted in 2004)"

. gen logdoc = log(docgrad)

Comparison of regression models predicting US News peer ratings from the size and log(size) of the doctoral cohort

n=87Model A

on docgradModel B

on log(docgrad)

Estimated SlopeEstimated S.E.

statistic

.687*(.128)5.36

27.274*(5.649)

4.83

25.27% 21.52%

* p<.001

A marginal case. Residuals suggest the transformation results in a modest improvement. Keep the original docgrad variable to simplify interpretation.

Page 9: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

From simple to multiple regression

Unit 6 / Page 9

kk XXXY ...22110

How does multiple regression help us?1. Simultaneous consideration of many contributing factors 2. We explain more of the variation in 3. Equivalently, more accurate predictions (smaller residuals)4. Provides a separate understanding of each predictor, accounting

for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant)

5. Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms.

More generally, let X1, X2, … Xk represent k predictors

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=− 40.52+.6907𝑔𝑟𝑒 �̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=313.45+.687𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Page 10: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

The stata syntax: regress peerrate gre docgrad (, then s)

• A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming docgrad can be held constant. (Simple linear regression coefficient: 6.91).

• A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming gre can be held constant. (Simple linear regression coefficient: 6.87) Unit 6 / Page 10

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 29.518 Adj R-squared = 0.5722 Residual 73189.9808 84 871.309296 R-squared = 0.5822 Model 101982.433 2 50991.2165 Prob > F = 0.0000 F( 2, 84) = 58.52 Source SS df MS Number of obs = 87

. regress peerrate gre docgrad

𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑+𝜖

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Population Model:

Sample Prediction Equation:

Page 11: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

From Unit 1: Three “best-fit” regression lines.The line you want, and why.

The OLS criterion minimizes the sum of vertical squared residuals.

Other definitions of “best fit” are possible:

Vertical Squared Residuals (OLS) Horizontal Squared Residuals (X on Y) Orthogonal Residuals

Unit 1 / Page 11

Page 12: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Minimize vertical squared residuals to the best fit plane

Unit 6 / Page 12

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Page 13: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Regression decomposition from Unit 1

)(: YYDevTotal i

Point to mean-plane =

)ˆ(: ii YYDevError )ˆ(: YYDevRegr i

Plane to mean-plane + Point to plane

∑ (�̂� 𝑖−𝑌 )2

∑ (𝑌 𝑖−𝑌 )2 =𝑆𝑆𝑀𝑆𝑆𝑇

=𝑅2

∑ (𝑌 𝑖−𝑌 )2=∑ (�̂� 𝑖−𝑌 )2+∑ (𝑌 𝑖−𝑌 𝑖)2

Sum of Squares Total () = Sum of Squares Model () + Sum of Squares Error ()

The proportion of total variation that is accounted for by the model.

What is …

Page 14: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 29.518 Adj R-squared = 0.5722 Residual 73189.9808 84 871.309296 R-squared = 0.5822 Model 101982.433 2 50991.2165 Prob > F = 0.0000 F( 2, 84) = 58.52 Source SS df MS Number of obs = 87

. regress peerrate gre docgrad

Analysis of Variance regression decomposition in Stata

SSError SS RegressSS Total

Analysis of Variance regression decomposition

175172 73190≈ 101982+

= 58.22%

Interpreting R2

58.22 percent of the variation in the peer

ratings is “attributable to” or “accounted for by” or “explained by”

or “associated with” or “predicted by” the size of the doctoral cohort and the GRE scores of

the admits.

What about the remaining 41.78%?Research, funding, famous people, location, history,measurement error,random error, individual variation, alien abductions…Error is what we haven’t modeled yet.

The Ubiquitous

The variance of that is accounted for by …

The single most widespread and easily interpretable summary statistic derivable from a single regression analysis. Essential

to describing the overall predictive function of the model.

Unit 6 / Page 14

∑ (𝑌 𝑖−𝑌 )2=∑ (�̂� 𝑖−𝑌 )2+∑ (𝑌 𝑖−𝑌 𝑖)2

Page 15: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Multiple regression supports inferences about “statistical control”

Unit 6 / Page 15

• A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming can be held constant. (Simple linear regression coefficient: 6.91).

• A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming can be held constant. (Simple linear regression coefficient: 6.87)

• Even accounting for one variable, the other variable still has predictive utility, and vice versa.

• Let’s see what the model implications are for “holding docgrad constant” or “account/adjust for docgrad”… for a typical small school, a midsized school, and a large school.

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Page 16: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Cornell VAComm UCDavisUCIrvine UCRiversideNorthwesternUHawaii BoulderHofstra Colorado Umiami Oklahoma DelawareUVM IndianaSt Lehigh BYU UWMilw SDState Syracuse StJohns GeorgeMasonWmandMaryVanderbiltBU Fordham IllinoisChicagoUtahState BaylorWashState LSU Utah IowaState UCSB Kentucky RutgersChapelHill IllState USCarolinaUCincinnatiPurdue WashingtonUNCG Michigan Stanford Oregon Claremont Auburn BC SUNYAlbanyBerkeley UNewMex USFUrbana Florida SUNYBuffaloMichiganStateUconnTennesseeUCLA UmassUhouston Uarizona UNCRaleighHarvard Penn OhioStateFSU Kansas Iowa Maryland ArizonaStatePitt Missouri GWUTemple

PennState TexasA&MMinneTCUvaUTAustinMadisonIndiana NYU

USC

Georgia

TC0

50

100

150

200

Num

ber

of d

octo

ral d

egr

ees

gra

nted

in 2

004

0 5 10Frequency

Distribution of the size of the doctoral cohort in 2004

Unit 6 / Page 16

. dotplot docgrad, mlabel(school) mlabsize(vsmall)

99% 193 193 Kurtosis 6.79704995% 110 141 Skewness 1.62942390% 89 119 Variance 1091.17475% 61 112 Largest Std. Dev. 33.0329350% 38 Mean 45.67816

25% 21 8 Sum of Wgt. 8710% 13 6 Obs 87 5% 10 4 1% 4 4 Percentiles Smallest Number of doctoral degrees granted in 2004

. summarize docgrad, detail

Let’s pick a small school with a doctoral cohort of 20, a midsized school with a doctoral cohort of 45, and a large school with a doctoral cohort of 80.

Page 17: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 (20 )=−11 .77+. 614𝑔𝑟𝑒

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 ( 45 )=1. 73+ .614𝑔𝑟𝑒

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 (80 )=20 .63+. 614𝑔𝑟𝑒

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Conditional Regression Lines: Visualizing the multiple regression model

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 17

Small school,

Midsized school,

Large school,

Large schoolMidsized schoolSmall school

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Page 18: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

From simple to multiple regression

Unit 6 / Page 18

kk XXXY ...22110

How does multiple regression help us?1. Simultaneous consideration of many contributing factors 2. We explain more of the variation in 3. Equivalently, more accurate predictions (smaller residuals)4. Provides a separate understanding of each predictor, accounting

for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant)

5. Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms.

More generally, let X1, X2, … Xk represent k predictors

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=− 40.52+.6907𝑔𝑟𝑒 �̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=313.45+.687𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Page 19: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

JasmineWhitney

Leah W.Nicole B.

Linda S.Nell W.

JeraulNazeemaHilary D.

Yujie D.Marc

Felicia B.

Jaime B.Michael H.Mark H.

LindsayEvan CSophia

PolinaColleen

MariahJ. Daniel L.

RaquelKatelyn A.

CarolineDemetra G.

Ruthie C.Chen S,.

Helen A.Jing J.

Anna F.Sarah R.

Mary C.Elizabeth P.Emily M.Rebecca S.

Chris K.Austin S.

Kate M.Mary C.

Beth S.Austin S.Anthony V.

John B.Julia G.

Kate N.Miguel S.Noemie

Scott A.Julianne V.

Chong-min F.

Beth M.Yan Y.

Meaghan S.Young S.

McCaila I.

Natalia O.Dean a.

Tiffany TCynthia P.

LaurelEmily F.Severin

JamesJosh B.

Sara M.Sara J.

MarikaJoseph T.

Liying S.Annie K.

Bob S.Tracy E.

Arianna B.Andrew B.

Rachel G.Kelly B.

Liz C.Claire Amanda b.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 5 10 15Frequency

Our Data: Peer Rating

Unit 6 / Page 19

If each of you were a school of education…With peer ratings, cohort sizes, and average GRE scores determined by your location in the 3D space of Larsen G08…

Page 20: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Meaghan S.

Miguel S. Kate N.Josh B. Julianne V.

Hilary D. Mary C. Beth M.Kate M. Liying S. Sophia Andrew B. PolinaSarah R. ColleenMarika Annie K. J. Rebecca S. Laurel Cynthia P.Jeraul Noemie Julia G. JasmineElizabeth P. Arianna B.

Scott A. Nell W. Felicia B.Sara J. Linda S. Mariah Claire Emily M. Demetra G.

Chong-min F.Austin S. John B. Natalia O.Emily F. Lindsay Bob S. Raquel Young S.Helen A. Ruthie C. Dean a.Leah W. Austin S. Jing J.Jaime B. Nicole B. Chen S,.Sara M. Kelly B. Severin NazeemaEvan C WhitneyMcCaila I. Tiffany T Amanda b.Katelyn A. Yujie D.Anthony V.Marc Anna F.

Michael H.Rachel G. Mark H. Tracy E. Joseph T.Daniel L.Mary C. James Chris K.Yan Y.Beth S. Caroline

Liz C.0

5010

015

0S

ize

of S

-030

Uni

vers

itie

s' g

rad

uatin

g c

lass

es

1 2 3 4 5 6 7Frequency

Our Data: Cohort Size

Unit 6 / Page 20

Page 21: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Marc

Linda S.JasmineKate N.SophiaMiguel S.Hilary D.Mark H.

PolinaJeraulLeah W.

Kate M.CarolineMichael H.

Chen S.WhitneyBeth S.

Mary C.Colleen

Nicole B.Nazeema

Liz C.Nell W.

Elizabeth P.Chris K.

Evan CLindsayJaime B.

Daniel L.Demetra G.Jing J.

Joseph T.Julianne V.

J. Felicia B.Josh B.Yujie D.

Anna F.Katelyn A.Sarah R.

Cynthia P.Tiffany TMcCaila I.Anthony V.

Rebecca S.Meaghan S.

Young S.James

Sara M.Dean a.

Natalia O.Yan Y.

RaquelHelen A.

Chong-min F.

Mary C.Emily M.Beth M.

Austin S.Austin S.Severin

Ruthie C.Scott A.

John B.Julia G.LaurelRachel G.

Claire Kelly B.Amanda b.

Tracy E.Bob S.

Liying S.NoemieAndrew B.Arianna B.Marika

Annie K.MariahSara J.Emily F.

400

500

600

700

800

Ave

rage

GR

E s

core

s fo

r S

-03

0 U

niv

ers

ities

' stu

den

ts

0 5 10 15Frequency

Our Data: Average GREs

Unit 6 / Page 21

Page 22: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Meaghan S.Beth M.

Cynthia P.

Julianne V.

Miguel S.

Marc

Jeraul

Liying S.

Evan C

J.

ClaireMarika

Sarah R.

Noemie

Laurel

MariahRaquel

Helen A.

Andrew B.Annie K.

Arianna B.

Felicia B.

Demetra G.Elizabeth P.

Emily F.

Emily M.

Sara J.

ColleenSophiaPolina

Hilary D.

Julia G.

Lindsay

Rebecca S.

Chong-min F.

Young S.Scott A.

John B.

Kate M.Mary C.

Josh B.

Kate N.

Jasmine

Ruthie C.

Austin S.

Dean a.

Yan Y.

Daniel L.

JamesTiffany T

Anthony V.

McCaila I.

Joseph T.

Anna F.

Nazeema

Chen S.

Yujie D.

Jing J.

Michael H.Mark H.

Katelyn A.

Nicole B.Whitney

Natalia O.

Caroline

Amanda b.Bob S.

Kelly B.Tracy E.Rachel G.

Liz C.

Linda S.Leah W.

Nell W.

Jaime B.

Severin

Austin S.

Sara M.

Chris K.

Mary C.Beth S.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 50 100 150Size of S-030 Universities' graduating classes

Our Data: Peer Rating on Cohort Size

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 22

This is clearly a weak, near zero relationship, far weaker than the empirical relationship between peer rating and cohort size.How can you visualize this in this classroom?

Page 23: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Meaghan S.Beth M.

Cynthia P.

Julianne V.

Miguel S.

Marc

Jeraul

Liying S.

Evan C

J.

Claire Marika

Sarah R.

Noemie

Laurel

MariahRaquelHelen A.

Andrew B.Annie K.Arianna B.

Felicia B.

Demetra G.Elizabeth P.

Emily F.

Emily M.

Sara J.

ColleenSophia

Polina

Hilary D.

Julia G.

Lindsay

Rebecca S.

Chong-min F.

Young S.Scott A.

John B.

Kate M.Mary C.

Josh B.

Kate N.

Jasmine

Ruthie C.

Austin S.

Dean a.

Yan Y.

Daniel L.

JamesTiffany T

Anthony V.

McCaila I.

Joseph T.

Anna F.

Nazeema

Chen S.

Yujie D.

Jing J.

Michael H.Mark H.

Katelyn A.

Nicole B.Whitney

Natalia O.

Caroline

Amanda b.Bob S.Kelly B.Tracy E.Rachel G.

Liz C.

Linda S.Leah W.

Nell W.

Jaime B.

Severin

Austin S.

Sara M.

Chris K.

Mary C.Beth S.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

400 500 600 700 800Average GRE scores for S-030 Universities' students

Our Data: Peer Rating on Average GRE

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 23

This is a weak-to-moderate relationship, slightly stronger than the empirical relationship between peer rating and average GRE scores.How can you visualize this in this classroom?

Page 24: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Multiple Regression: Finding the best-fit (hyper)plane

Unit 6 / Page 24

Vertical Squared Residuals (OLS)

Page 25: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Multiple Regression on Our Data in Stata

• A 100-point increment in average GREs predicts a 47.7-point increment in peer ratings, assuming can be held constant.

• A 100-student increment in the size of the graduating doctoral cohort predicts a 10.8-point increment in peer ratings, assuming can be held constant.

• 48.07% of the variation in ratings is accounted for by the two predictor variables.• On your own and in discussion and consultation with your neighbors:

– Write down your name, and then estimate your PeerRate, DocGrad, and GRE values from your location in this room.

– Use the prediction equation above and calculate your (your predicted peer rating, your fitted value).

– Write down your residual. How can you interpret this residual?© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 25

_cons 73.79312 36.95902 2.00 0.049 .2133642 147.3729 ourgre .4765136 .0560894 8.50 0.000 .3648481 .5881791 oursize .1080475 .1624824 0.66 0.508 -.21543 .4315251 ourrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 286050 80 3575.625 Root MSE = 43.64 Adj R-squared = 0.4674 Residual 148544.887 78 1904.42162 R-squared = 0.4807 Model 137505.113 2 68752.5567 Prob > F = 0.0000 F( 2, 78) = 36.10 Source SS df MS Number of obs = 81

. regress ourrate oursize ourgre

�̂�𝑢𝑟𝑟𝑎𝑡𝑒=73.79+.108𝑜𝑢𝑟𝑠𝑖𝑧𝑒+ .477𝑜𝑢𝑟𝑔𝑟𝑒

Page 26: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Meaghan S.

Miguel S. Kate N.Josh B. Julianne V.

Hilary D. Mary C. Beth M.Kate M. Liying S. Sophia Andrew B. PolinaSarah R. ColleenMarika Annie K. J. Rebecca S. Laurel Cynthia P.Jeraul Noemie Julia G. JasmineElizabeth P. Arianna B.

Scott A. Nell W. Felicia B.Sara J. Linda S. Mariah Claire Emily M. Demetra G.

Chong-min F.Austin S. John B. Natalia O.Emily F. Lindsay Bob S. Raquel Young S.Helen A. Ruthie C. Dean a.Leah W. Austin S. Jing J.Jaime B. Nicole B. Chen S,.Sara M. Kelly B. Severin NazeemaEvan C WhitneyMcCaila I. Tiffany T Amanda b.Katelyn A. Yujie D.Anthony V.Marc Anna F.

Michael H.Rachel G. Mark H. Tracy E. Joseph T.Daniel L.Mary C. James Chris K.Yan Y.Beth S. Caroline

Liz C.

050

100

150

Siz

e o

f S-0

30 U

nive

rsiti

es'

gra

dua

ting

cla

sses

1 2 3 4 5 6 7Frequency

Constructing a conditional regression plot

• We can plot peerrate on gre with different docgrad lines, or we can plot peerrate on docgrad with different gre lines.

• Generally, we place the primary predictor of interest on the axis (let’s say gre) and the “control” predictor or “covariate” on the legend.

• With our data, we can pick prototypical oursize values of 20, 60, and 100.

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 26

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Large (80)Midsized (45)Small (20)

- Select 2-5 prototypical values - Substantively interesting values - A range of percentiles (10 or 25, 50, 75 or 90) - The sample mean ± .5 or 1 standard deviation - Easily communicated values, whole numbers

or fractions. - When the variable is already ordinal (ordered

categories, e.g., coded 0, 1, and 2 for small, medium, and large), so much the better

Page 27: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

400 500 600 700 800Average GRE scores for S-030 Universities' students

Conditional regression lines for our data

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 27

Small schools (o

ursize = 20)*

Large schools (oursiz

e = 100)*

* Differences between conditional regression lines are not statistically significant.

_cons 73.79312 36.95902 2.00 0.049 .2133642 147.3729 ourgre .4765136 .0560894 8.50 0.000 .3648481 .5881791 oursize .1080475 .1624824 0.66 0.508 -.21543 .4315251 ourrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 286050 80 3575.625 Root MSE = 43.64 Adj R-squared = 0.4674 Residual 148544.887 78 1904.42162 R-squared = 0.4807 Model 137505.113 2 68752.5567 Prob > F = 0.0000 F( 2, 78) = 36.10 Source SS df MS Number of obs = 81

. regress ourrate oursize ourgre

Page 28: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 50 100 150Size of S-030 Universities' graduating classes

Conditional regression lines for our data

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 28

High GRE Schools (ourgre = 700)*

* Slopes of conditional regression lines are not statistically significant (cannot be distinguished from 0).

_cons 73.79312 36.95902 2.00 0.049 .2133642 147.3729 ourgre .4765136 .0560894 8.50 0.000 .3648481 .5881791 oursize .1080475 .1624824 0.66 0.508 -.21543 .4315251 ourrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 286050 80 3575.625 Root MSE = 43.64 Adj R-squared = 0.4674 Residual 148544.887 78 1904.42162 R-squared = 0.4807 Model 137505.113 2 68752.5567 Prob > F = 0.0000 F( 2, 78) = 36.10 Source SS df MS Number of obs = 81

. regress ourrate oursize ourgre

Mid-Range GRE Schools (ourgre = 600)*

Low GRE Schools (ourgre = 500)*

Page 29: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

• We imagine that the plot extends into the slide along a third axis, docgrad, where larger schools are deeper and smaller schools are closer (or vice versa).

• The regression model fits the best fit *plane* through this scatterplot in three dimensional space, minimizing the vertical squared residuals between each point and the plane.

• Instead of a single regression line of peerrate on gre, we have a regression line for every level of docgrad, extending along the plane.

• Note that the lines are parallel, and they must be given our regression model: We loosen this assumption in Unit 10.

Interpreting the conditional regression plot for empirical data

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 29

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Large (80)Midsized (45)Small (20)

60

53

38

193

22

10

43

61

38106

112

89

39

52

110

102

37

50

119 42

61 66 96

71

51 65

141

33

52

64

76

68

53

86

67

86 70

38

27

17

32

1224

21

4

30

55

18

3620

63

16

23

40

15

19

59

11

46

42

45

34

4

51

6

8

24

29 21

29

586010

36

28

41 27

18

28

30

15

19

1333 18

21

18

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Page 30: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

60

53

38

193

22

10

43

61

38106

112

89

39

52

110

102

37

50

119 42

61 66 96

71

51 65

141

33

52

64

76

68

53

86

67

86 70

38

27

17

32

1224

21

4

30

55

18

3620

63

16

23

40

15

19

59

11

46

42

45

34

4

51

6

8

24

29 21

29

586010

36

28

41 27

18

28

30

15

19

1333 18

21

18

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Interpreting a conditional regression plot

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 30

Large (80)

Midsized (45)

Small (20)

Adding the other predictor improves prediction, reduces residuals, and reduces residual variance, but residual variance obviously remains.

The magnitude of the axis predictor can be seen as the slope. The magnitude of the legend predictor can be seen as the spacing between the lines.

Page 31: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

Illustrative contrasts in regression lines

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 31

Prediction of peer rating by average GRE scores for large schools (docgrad = 80)

Prediction of peer rating by average GRE scores for small schools (docgrad = 20)

Prediction of peer rating by average GRE scores for the average-size schools (docgrad = 45.68). The conditional regression line.

Prediction of peer rating by average GRE scores without accounting for school size. The unconditional regression line.This is not the prediction for an average-sized school.This is the prediction if we knew nothing about size!

Page 32: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

StJohnsIllStateAuburnClaremont KentuckyIndianaStSDState UWMilw UNewMexUCincinnati

UCRiverside UCIrvine UNCG Colorado TexasA&M Utah UCDavis WashStateUSCarolina Oklahoma UNCRaleigh USF PittIllinoisChicago Missouri Uarizona UVM Kansas IowaState BYUArizonaState Syracuse VAComm OhioState Umass TempleBaylor SUNYBuffalo Uhouston FordhamGeorgia WmandMary LSU Tennessee UtahStateUCSB GWU BUGeorgeMason Cornell SUNYAlbany FSU Maryland ChapelHillUSC Lehigh PennStateRutgers MinneTC UHawaii Umiami Uva Purdue UCLAMadison BCMichiganState UTAustin UconnBoulder Florida Hofstra Washington NYU IndianaIowaPenn TC BerkeleyMichigan Oregon

Urbana

DelawareVanderbilt Harvard

Northwestern Stanford

450

500

550

600

650

700

Ave

rag

e m

ean

verb

al a

nd q

uant

itativ

e G

RE

sco

res

0 2 4 6 8Frequency

Another perspective on the implications of the same model…

Unit 6 / Page 32

Let’s pick a low-GRE school of 500, a midrange GRE school of 550, and a high GRE school of 600.

99% 677.5 677.5 Kurtosis 3.56567595% 654 677 Skewness .652933890% 605 662.5 Variance 1840.3575% 584.5 660.5 Largest Std. Dev. 42.899350% 550.5 Mean 557.8966

25% 527.5 489.5 Sum of Wgt. 8710% 504 484 Obs 87 5% 494.5 478.5 1% 474.5 474.5 Percentiles Smallest Average mean verbal and quantitative GRE scores

. summarize gre, detail

. dotplot gre, mlabel(school) mlabsize(vsmall)

Page 33: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

0 50 100 150 200Number of doctoral degrees granted in 2004

Another perspective on conditional regression lines

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 33

Low-scoring school,

Midrange school,

High-scoring school,

High-scoring school (600)

Midrange school (550)

Low-scoring school (500)

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (500)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=284.43+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (550)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=315.13+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (600)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=345.83+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑• A 10-person increment in the

size of the doctoral cohort predicts a 5.4-point increment in the predicted mean peer rating by deans, assuming average GREs can be held constant.

• An 50-point increment in average GRE scores predicts a 30.7-point increment in the predicted mean peer rating by deans, assuming the size of the doctoral cohort can be held constant.

Page 34: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

MST=SST/dfSSTn – 1

MSE=SSE/dfSSEn – k – 1

MSR=SSR/dfSSRk ~ #predictors

Mean Squaredf

Analysis of Variance

Total

Residual(Error)

Model (Regression)

Sum of SquaresSource

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

- MS stands for Mean Square, an average sum of squares, like a variance.- In fact, MS Total *is* your variance, the unconditional variance of the peerrate variable, the square of the standard deviation: - MS Model, the model mean square, is a measure of the variation accounted for by the model (not easily interpretable on the scale). The bigger the better.- MS Residual, your residual variance, is interpretable as error variance ()

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 29.518 Adj R-squared = 0.5722 Residual 73189.9808 84 871.309296 R-squared = 0.5822 Model 101982.433 2 50991.2165 Prob > F = 0.0000 F( 2, 84) = 58.52 Source SS df MS Number of obs = 87

. regress peerrate gre docgradANalysis Of VAriance (ANOVA) regression decomposition for multiple regression

Unit 6 / Page 34© Andrew Ho, Harvard Graduate School of Education

Everything to this point is the same as

simple linear regression, except that we are paying more attention to the mean squares.

Page 35: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

The distribution, a sampling distribution for variance ratios

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 35

MST=SST/dfSSTn – 1

MSE=SSE/dfSSEn – k – 1

MSR=SSR/dfSSRk ~ #predictors

Mean Squaredf

Analysis of Variance

Total

Residual(Error)

Model (Regression)

Sum of SquaresSourceMSE

MSRF

- If we sample two variances (MSR and MSE) from two populations with equal variances, and if we take repeated ratios (MSR/MSE), sometimes the numerator (MSR) will be larger, and sometimes the denominator (MSE) will be larger.- The distribution will be loosely centered on 1 and will always be positive.- If one variance is craaaaazy bigger than another, we might conclude that our null hypothesis of equal population variances is incorrect and accept the alternative:

unequal population variances.

This is the sample size from the first distribution (-1)

This is the sample size from the second

distribution (-1)

F distribution: http://www.capdm.com/demos/software/html/capdm/qm/fdist/usage.html Sampling distributions: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

Page 36: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

The omnibus test for multiple regression

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 36

MST=SST/dfSSTn – 1

MSE=SSE/dfSSEn – k – 1

MSR=SSR/dfSSRk ~ #predictors

Mean Squaredf

Analysis of Variance

Total

Residual(Error)

Model (Regression)

Sum of SquaresSource MSE

MSRF

- This particular statistic represents variance accounted for by the regression model over error variance (unaccounted for).- In other words, good variance over bad variance. We want F to be as large as possible.- Under the null hypothesis, there is no predictive value to *any* of your predictors in the population: - If the model variance is sufficiently greater than the error variance, we can reject

- We accept One or more population slopes are nonzero.

This is , the number of predictors

This is , a quantity that increases with

sample size

0...: 210 H

0H:aH

MSR and MSE are scaled (by dividing by ) such that, under the null hypothesis, they act like sample variances from populations with equal variance.

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

Page 37: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

• Null hypothesis: The predictors account for no Y variance in the population. The predictors have no predictive utility in the population.

• Test statistic: . Ratio of regression model variance to error variance. Variance accounted for over variance unaccounted for.

• Decision rule: If , reject in favor of : one or more population slopes are nonzero; the model has some predictive utility, some

The omnibus test

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 37

1.001.301.561.72

1.221.381.641.77

1.571.681.782.01

1.831.932.032.24

2.212.312.402.60

2.372.462.562.76

2.602.702.792.99

3.003.093.183.39

3.843.944.034.24

inf1005025

for denominator (MSE)

1000

120

20

10

5

4

3

2

1

for numerator (MSR)

Critical values of (α=.05)

0...: 210 H

Stata returns critical values as: display invFtail(k, n-k-1, .05)

Here indexes one of predictor slopes.

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 29.518 Adj R-squared = 0.5722 Residual 73189.9808 84 871.309296 R-squared = 0.5822 Model 101982.433 2 50991.2165 Prob > F = 0.0000 F( 2, 84) = 58.52 Source SS df MS Number of obs = 87

. regress peerrate gre docgrad

k ~ number of predictors

n-k-1

The probability of sampling an ratio of 58.52 or larger under the null hypothesis of no predictive utility is 0.0000… very, very low, so we reject the null hypothesis and conclude that one or more predictors account for some variance in the population.

Page 38: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

The omnibus F test vs. slope tests

Omnibus F testAcross all of my predictors, is regression

helping at all in the population?

Unit 6 / Page 38

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 29.518 Adj R-squared = 0.5722 Residual 73189.9808 84 871.309296 R-squared = 0.5822 Model 101982.433 2 50991.2165 Prob > F = 0.0000 F( 2, 84) = 58.52 Source SS df MS Number of obs = 87

. regress peerrate gre docgrad

0...: 210 H 0 some: jaH

In the population, regression does not

help at all

In the population, this set of predictors has

some predictive utility

t-tests for slopesIn the population, does this variable help prediction

above and beyond other included variables?

0:0 jH 0: jaH

In the population, this predictor has no effect when accounting for

other predictors in the model.

In the population, this predictor has an effect when accounting for

other predictors in the model.

Page 39: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Three equivalent tests in simple linear regression

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 39

Let’s revisit a simple linear regression model analogous to one you’re considering in Assignment #3: Math Achievement (TIMSS) on the log of income (per capita GDP) for countries.

Test 1: The omnibus testEvaluates whether the any of the populations slopes are nonzero.

With only one slope…

Test 2: The -test for slopeEvaluates whether a particular population slope is nonzero…

Fun fact 1: When there is only one predictor (that is, the degrees of freedom in the numerator is 1), then , and the -values from the two tests will be identical (.0315).

Test 3: The -test for correlation (pwcorr command)Evaluates whether a population correlation is nonzero.

With a single predictor, the significance of the omnibus prediction, the significance of a single slope, and the significance of an association are identical.

With k > 1 predictor, this equivalence does not hold.

Fun fact 2: When k = 1, R-sq =

Page 40: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Summarizing regression output from multiple models: estoutFind and install the estout package with the following command:

findit estoutand scroll past the search results to the third listed package under the Web Resources heading. Click the

link, then click the helpful link to the right, labeled, “Click here to install.”As always, you will begin by conducting a thorough review of the univariate and bivariate distributions

and statistics, exploring various models and diagnostics, and considering remedial options.At a certain point, you may consider saving promising, helpful, illustrative or otherwise benchmark-

setting models for your review. This can be done with the estout packageDocumentation: http://repec.org/bocode/e/estout/index.html

Unit 6 / Page 40

* p<0.05, ** p<0.01, *** p<0.001t statistics in parentheses N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) docgrad 0.687*** 0.540***

(8.02) (8.14) gre 0.691*** 0.614*** peerrate peerrate peerrate (1) (2) (3)

Page 41: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Important esttab options

Unit 6 / Page 41

* p<0.05, ** p<0.01, *** p<0.001PhD Size is the number of conferred PhDs in 2004t statistics in parentheses df_r 85 85 84 df_m 1 1 2 F 64.40 28.75 58.52 adj. R-sq 0.424 0.244 0.572 R-sq 0.431 0.253 0.582 N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006 Add a descriptive title and short

but descriptive coeflabels.

Descriptive model titles (mtitles) should be added for reference and numbers suppressed (nonumbers).

With many models, you may need the compress option, although this will limit the length of your coeflabels and mtitles.

APA guidelines require that figures and tables be self-contained, so a table note (addnote) can be useful.

Finally, note our desired statistics, R-sq, adj-R-sq, F, and our degrees of freedom for the model and for the residual.

Page 42: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Other esttab features to note

• If you store too many models with eststo, clear its memory with eststo clear. Then you’ll have use eststo quietly regress … to store again from zero.

• Never forget the importance of help, e.g., help esttab• When presenting tables, it is generally more common to include standard errors, not t statistics, in

parentheses. To do this, use the se option.• Likewise, for parsimony, the F statistic and degrees of freedom are not commonly reported,

although they can be helpful for model selection.• The R-sq is a necessity. The adjusted R-sq helps in model selection (you’ll soon see why), but it is

less commonly reported unless model selection is the focus.© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 42

* p<0.05, ** p<0.01, *** p<0.001PhD Size is the number of conferred PhDs in 2004t statistics in parentheses df_r 85 85 84 df_m 1 1 2 F 64.40 28.75 58.52 adj. R-sq 0.424 0.244 0.572 R-sq 0.431 0.253 0.582 N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006

Page 43: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Reporting results for the multiple regression model

Reporting statistical significance:The two-predictor model with

average GRE score and the size of the doctoral cohort accounts for 58.2% of the variation in peer ratings.

The omnibus null hypothesis of no predictive utility can be rejected F(2,84)=58.52, p<.001.

Unit 6 / Page 43

* p<0.05, ** p<0.01, *** p<0.001PhD Size is the number of conferred PhDs in 2004t statistics in parentheses df_r 85 85 84 df_m 1 1 2 F 64.40 28.75 58.52 adj. R-sq 0.424 0.244 0.572 R-sq 0.431 0.253 0.582 N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006

(The omnibus p-value is not shown in this table and must be gathered from the original regression output as Prob>F.)

The coefficient for the average GRE score is statistically significant, t(84)=8.14, p<.001. The model suggests that an increment of 10 points in average GRE score is associated with an increment of 6.14 points on the peer rating scale, if the size of the doctoral cohort could be held constant.

The coefficient for the size of the doctoral cohort is statistically significant, t(84)=5.51, p<.001. The model implies that an increment of 10 students in doctoral degree conferral predicts an increment of 5.40 points on the peer rating scale, accounting for the average GRE score of the students.

Page 44: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

Beginning to look across the models

© Andrew Ho, Harvard Graduate School of Education Unit 6 / Page 44

* p<0.05, ** p<0.01, *** p<0.001PhD Size is the number of conferred PhDs in 2004t statistics in parentheses df_r 85 85 84 df_m 1 1 2 F 64.40 28.75 58.52 adj. R-sq 0.424 0.244 0.572 R-sq 0.431 0.253 0.582 N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006

The average GRE score alone accounts for 43.1% of the variance in peer ratings, and the size of the graduating doctoral cohort alone accounts for 25.3% of the variance in peer ratings. Together, these two variables account for 58.2% of the variance in peer ratings. Note that 43.1% + 25.3% = 68.4% of the variance, yet only 58.2% of the variance is accounted for by both variables. Why the difference? The percentages will sum tidily if and only if the predictors have zero correlation with each other, a veritable impossibility with real data outside of a coding error.

Each predictor’s estimated coefficient decreases in magnitude when the other predictor is included in the model. However, the signs are unchanged, the decline does not appear to be substantively significant, and the statistical significance of the two predictors remains unchanged. Our inferences about the prediction of peer ratings are reasonably robust across the model forms and compositions shown in this table. Given that our interpretations are robust to model choice, and the two-predictor model is substantively defensible, we opt for the model that supports the best predictions.

Page 45: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Diagnostic plots for multiple regression

Unit 6 / Page 45

Our regression assumptions haven’t changed: independent, normally distributed residuals with equal variance centered on 0. Our plots remain the same, as well: residuals vs. fitted values (rvfplot), leverage vs. discrepancy (lvr2plot), and Cook’s distance, to name a few.

Stanford

VanderbiltBerkeley NYU

Delaware

0.0

5.1

.15

.2.2

5C

ook'

s di

stan

ce

300 350 400 450Fitted values

Harvard

UCLA

Stanford

TC

Vanderbilt

Northwestern

BerkeleyPennMichigan

MadisonNYU

MinneTC Oregon

MichiganState

Indiana

UTAustin

Washington

Urbana

USC

BC OhioStateMaryland

Uva

GWUFloridaIowa

Georgia

ChapelHillUconnKansas TemplePittTennessee

PennStateArizonaState

TexasA&M

MissouriUNCGUtahState

Delaware

Rutgers

Boulder

IllinoisChicagoWmandMary

Cornell

UCSBUmassLehigh

Purdue SyracuseFSUOklahomaBU

Claremont

UmiamiSDState

UarizonaUHawaiiUSF

SUNYAlbany

UNewMexUSCarolina

VAComm

SUNYBuffalo

UCDavisUCIrvine

FordhamUtahGeorgeMasonIowaState UhoustonUNCRaleigh

UCRiverside UCincinnatiWashState

Auburn

BaylorBYU

LSU

KentuckyColorado

UWMilw Hofstra

IllStateIndianaSt

StJohns

UVM

0.0

5.1

.15

.2.2

5Le

vera

ge

0 .02 .04 .06 .08Normalized residual squared

Delaware

-3-2

-10

12

Sta

ndar

dize

d/S

tude

ntiz

ed R

esid

ual

300 350 400 450Fitted values

We speak now in terms of discrepancy from, leverage on, and influence upon

the estimated regression plane.

Page 46: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

Midterm Checkin

• Assignment #3 will be passed back by email over the next 24 hours.• Assignment #4 has been posted, partners mandatory.

– Notify me by Friday if you do not have a partner. No exceptions.– https://docs.google.com/spreadsheet/ccc?key=0AuXHUPzC9fGPdGdSc0VacnFzUnI5YmswbkoxNGE3YWc#gid=0

• Partnerships take work. Open communication. Constant upkeep.• Attendance. Come to class regularly.

• Resources: Lectures, videos, sections, partnerships, study groups, my office hours.

• Course Philosophies: Disciplined Perception. Layering. Language. Collaboration. Sensitivity. Statistical vs. Substantive. Exploratory Analysis. Unit 6 / Page 46

Page 47: Unit 6: The basics of multiple regression Class 14… Class 15…Class 14…Class 15… //xkcd.com/314/ Unit 6 / Page 1© Andrew Ho, Harvard

© Andrew Ho, Harvard Graduate School of Education

• Multiple regression holds several advantages over single-predictor regression• Simultaneous consideration of multiple correlated predictors• Improved prediction of outcomes• Parsimony in model expression• Support of inferences like, “the effect of X while accounting for Z”• More control over models consistent with our theories and

hypotheses• Construction and conceptualization of 2D and 3D representations of

two-predictor models in terms of best-fit planes• ANOVA decomposition and R-sq for multiple regression• The omnibus F test vs. t-tests for slopes• The estout package and the tabular juxtaposition of models• Regression diagnostics in multiple regression

What are the takeaways from this unit?

Unit 6 / Page 47