unit 6: the basics of multiple regression class 14… class 15…class 14…class 15…...

© Andrew Ho, Harvard Graduate School of Education

Unit 6: The basics of multiple regression Class 14… Class 15…

http://xkcd.com/314/Unit 6 /

http://xkcd.com/314/605/


Where is Unit 6 in our 11-Unit Sequence?

Unit 6:The basics of

multiple regression

Unit 7:Statistical control in depth:Correlation and collinearity

Unit 10:Interaction and quadratic effects

Unit 8:Categorical predictors I:

Dichotomies

Unit 9:Categorical predictors II:

Polychotomies

Unit 11:Regression in practice. Common Extensions.

Unit 1:Introduction to

simple linear regression

Unit 2:Correlation

and causality

Unit 3:Inference for the regression model

Building a solid

foundation

Unit 4:Regression assumptions:Evaluating their tenability

Unit 5:Transformations

to achieve linearity

Mastering the

subtleties

Adding additional predictors

Generalizing to other types of predictors and

effects

Pulling it all

together

Unit 6 /


In this unit, we’re going to cover…

• Various representations of the multiple regression model:– An algebraic representation – A three dimensional graphic representation– A two dimensional graphic representation

• Multiple regression—how it works and helps improve predictions– Estimating the parameters of the multiple regression model– Holding predictors constant—what does this really mean?

• Plotting the fitted multiple regression model: – Deciding how to construct the plot– Choosing prototypical values – Learning how to actually construct the plot (and interpret it correctly!)

• and the Analysis of Variance (ANOVA) in multiple regression• Inference in multiple regression

– The omnibus -test in multiple regression– Individual -tests

• How might we summarize MR results in both tables and figures?Unit 6 /


US News and World Report education school rankings from 2006

Unit 6 /

How do student characteristics like GRE scores and size of the doctoral class predict peer ratings of Ed Schools? Do schools gain in reputation for graduating large numbers of high-achieving students?

http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-education-schools/edu-rankings

http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-education-schools/edu-rankings


As always, our starting point

Unit 6 /

docgrad 87 45.67816 33.03293 4 193 gre 87 557.8966 42.8993 474.5 677.5 peerrate 87 344.8276 45.1319 280 470 Variable Obs Mean Std. Dev. Min Max

. su peerrate gre docgrad

05

1015

20F

req

uenc

y

300 350 400 450Mean peer rating by Deans (100-500)

010

2030

Fre

que

ncy

0 50 100 150 200Number of doctoral degrees granted in 2004

05

1015

2025

Fre

que

ncy

450 500 550 600 650Average mean verbal and quantitative GRE scores

05

1015

20F

req

uenc

y

1 2 3 4 5log(Number of doctoral degrees granted in 2004)

31. Temple 320 542.5 76 30. Kansas 350 535 64 29. Uconn 340 589 52 28. ChapelHill 390 566 33 27. Georgia 380 549 141 26. Iowa 360 600.5 65 25. Florida 360 591 51 24. GWU 340 558.5 71 23. Uva 400 576.5 96 22. Maryland 400 566 66 21. OhioState 400 541 61 20. BC 360 584.5 42 19. USC 360 569.5 119 18. Urbana 410 633 50 17. Washington 370 593 37 16. UTAustin 400 586.5 102 15. Indiana 390 596 110 14. MichiganState 420 586.5 52 13. Oregon 340 611.5 39 12. MinneTC 390 575 89 11. NYU 360 596 112 10. Madison 430 580 106 9. Michigan 430 609 38 8. Penn 380 604 61 7. Berkeley 440 605 43 6. Northwestern 390 677 10 5. Vanderbilt 430 660.5 22 4. TC 440 604.5 193 3. Stanford 470 677.5 38 2. UCLA 410 578 53 1. Harvard 450 662.5 60 school peerrate gre docgrad

. list school peerrate gre docgrad, clean

Mean peerrating byDeans

(100-500)

Average meanverbal andquantitativeGRE scores

Number ofdoctoraldegrees

granted in2004

300

400

500

300 400 500

500

600

700

500 600 700

0

100

200

0 100 200

Scatterplot matrix: graph matrix

© Andrew Ho, Harvard Graduate School of Education Unit 6 /

. graph matrix peerrate gre docgrad


-100

-50

050

100

Res

idua

ls o

f pee

rrat

e on

gre

300 350 400 450Fitted values

-50

050

100

150

Res

idua

ls o

f pee

rrat

e on

doc

grad


Use graph combine to be more succinct.

Unit 6 /

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)

450 500 550 600 650 700Average mean verbal and quantitative GRE scores

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)


. graph combine peer_gre peer_doc, cols(1)

. scatter peerrate docgrad || lfit peerrate docgrad, legend(off) ytitle("Mean peer rating by Deans (100-500)") name(peer_doc, replace)

. scatter peerrate gre || lfit peerrate gre, legend(off) ytitle("Mean peer rating by Deans (100-500)") name(peer_gre, replace)

. graph combine peer_gre peer_doc, cols(1) fxsize(60)

. graph combine rvf_gre rvf_doc, cols(1) fxsize(60)

. rvfplot, yline(0) ytitle(Residuals of peerrate on docgrad) name(rvf_doc, replace)

. rvfplot, yline(0) ytitle (Residuals of peerrate on gre) name(rvf_gre, replace)

Given the variable (enrollment) and the shape of these plots, a log transformation is worth exploring.


300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)

50 100 150 200Number of doctoral degrees granted in 2004 (log scale)

-100

-50

05

01

001

50R

esid

ual

s o

f pe

erra

te o

n lo

g(d

ocg

rad

)


-50

05

01

001

50R

esid

ual

s o

f pe

erra

te o

n do

cgra

d


Logarithmic transformation of the docgrad variable

Unit 6 /

_cons 313.4533 7.207113 43.49 0.000 299.1236 327.783 docgrad .6868552 .1281049 5.36 0.000 .4321483 .9415621 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 175172.414 86 2036.88853 Root MSE = 39.243 Adj R-squared = 0.2439 Residual 130901.042 85 1540.01226 R-squared = 0.2527 Model 44271.3714 1 44271.3714 Prob > F = 0.0000 F( 1, 85) = 28.75 Source SS df MS Number of obs = 87

. regress peerrate docgrad

_cons 247.6276 20.588 12.03 0.000 206.6931 288.5621 logdoc 27.27396 5.648818 4.83 0.000 16.04259 38.50532 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]


. regress peerrate logdoc

. label variable logdoc "log(Number of doctoral degrees granted in 2004)"

. gen logdoc = log(docgrad)

Comparison of regression models predicting US News peer ratings from the size and log(size) of the doctoral cohort

n=87Model A

on docgradModel B

on log(docgrad)

Estimated SlopeEstimated S.E.

statistic

.687*(.128)5.36

27.274*(5.649)

4.83

25.27% 21.52%

* p<.001

A marginal case. Residuals suggest the transformation results in a modest improvement. Keep the original docgrad variable to simplify interpretation.


From simple to multiple regression

Unit 6 /

kk XXXY ...22110

How does multiple regression help us?1. Simultaneous consideration of many contributing factors 2. We explain more of the variation in 3. Equivalently, more accurate predictions (smaller residuals)4. Provides a separate understanding of each predictor, accounting

for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant)

5. Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms.

More generally, let X1, X2, … Xk represent k predictors

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=− 40.52+.6907𝑔𝑟𝑒 �̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=313.45+.687𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑


The stata syntax: regress peerrate gre docgrad (, then s)

• A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming docgrad can be held constant. (Simple linear regression coefficient: 6.91).

• A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming gre can be held constant. (Simple linear regression coefficient: 6.87) Unit 6 /

_cons -22.56979 41.64255 -0.54 0.589 -105.3806 60.24099 docgrad .540348 .0980256 5.51 0.000 .3454134 .7352827 gre .614299 .0754808 8.14 0.000 .4641971 .7644008 peerrate Coef. Std. Err. t P>|t| [95% Conf. Interval]


. regress peerrate gre docgrad

𝑝𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑+𝜖

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614𝑔𝑟𝑒+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

Population Model:

Sample Prediction Equation:


From Unit 1: Three “best-fit” regression lines.The line you want, and why.

The OLS criterion minimizes the sum of vertical squared residuals.

Other definitions of “best fit” are possible:

Vertical Squared Residuals (OLS) Horizontal Squared Residuals (X on Y) Orthogonal Residuals

Unit 1 /


Minimize vertical squared residuals to the best fit plane

Unit 6 /


Regression decomposition from Unit 1

)(: YYDevTotal i

Point to mean-plane =

)ˆ(: ii YYDevError )ˆ(: YYDevRegr i

Plane to mean-plane + Point to plane

∑ (�̂� 𝑖−𝑌 )2

∑ (𝑌 𝑖−𝑌 )2 =𝑆𝑆𝑀𝑆𝑆𝑇

=𝑅2

∑ (𝑌 𝑖−𝑌 )2=∑ (�̂� 𝑖−𝑌 )2+∑ (𝑌 𝑖−𝑌 𝑖)2

Sum of Squares Total () = Sum of Squares Model () + Sum of Squares Error ()

The proportion of total variation that is accounted for by the model.

What is …





Analysis of Variance regression decomposition in Stata

SSError SS RegressSS Total

Analysis of Variance regression decomposition

175172 73190≈ 101982+

= 58.22%

Interpreting R2

58.22 percent of the variation in the peer

ratings is “attributable to” or “accounted for by” or “explained by”

or “associated with” or “predicted by” the size of the doctoral cohort and the GRE scores of

the admits.

What about the remaining 41.78%?Research, funding, famous people, location, history,measurement error,random error, individual variation, alien abductions…Error is what we haven’t modeled yet.

The Ubiquitous

The variance of that is accounted for by …

The single most widespread and easily interpretable summary statistic derivable from a single regression analysis. Essential

to describing the overall predictive function of the model.

Unit 6 /

∑ (𝑌 𝑖−𝑌 )2=∑ (�̂� 𝑖−𝑌 )2+∑ (𝑌 𝑖−𝑌 𝑖)2


Multiple regression supports inferences about “statistical control”

Unit 6 /

• A 10-point increment in average GREs predicts a 6.14-point increment in peer ratings, assuming can be held constant. (Simple linear regression coefficient: 6.91).

• A 10-student increment in the size of the graduating doctoral cohort predicts a 5.4-point increment in peer ratings, assuming can be held constant. (Simple linear regression coefficient: 6.87)

• Even accounting for one variable, the other variable still has predictive utility, and vice versa.

• Let’s see what the model implications are for “holding docgrad constant” or “account/adjust for docgrad”… for a typical small school, a midsized school, and a large school.



Cornell VAComm UCDavisUCIrvine UCRiversideNorthwesternUHawaii BoulderHofstra Colorado Umiami Oklahoma DelawareUVM IndianaSt Lehigh BYU UWMilw SDState Syracuse StJohns GeorgeMasonWmandMaryVanderbiltBU Fordham IllinoisChicagoUtahState BaylorWashState LSU Utah IowaState UCSB Kentucky RutgersChapelHill IllState USCarolinaUCincinnatiPurdue WashingtonUNCG Michigan Stanford Oregon Claremont Auburn BC SUNYAlbanyBerkeley UNewMex USFUrbana Florida SUNYBuffaloMichiganStateUconnTennesseeUCLA UmassUhouston Uarizona UNCRaleighHarvard Penn OhioStateFSU Kansas Iowa Maryland ArizonaStatePitt Missouri GWUTemple

PennState TexasA&MMinneTCUvaUTAustinMadisonIndiana NYU

USC

Georgia

TC0

50

100

150

200

Num

ber

of d

octo

ral d

egr

ees

gra

nted

in 2

004

0 5 10Frequency

Distribution of the size of the doctoral cohort in 2004

Unit 6 /

. dotplot docgrad, mlabel(school) mlabsize(vsmall)

99% 193 193 Kurtosis 6.79704995% 110 141 Skewness 1.62942390% 89 119 Variance 1091.17475% 61 112 Largest Std. Dev. 33.0329350% 38 Mean 45.67816

25% 21 8 Sum of Wgt. 8710% 13 6 Obs 87 5% 10 4 1% 4 4 Percentiles Smallest Number of doctoral degrees granted in 2004

. summarize docgrad, detail

Let’s pick a small school with a doctoral cohort of 20, a midsized school with a doctoral cohort of 45, and a large school with a doctoral cohort of 80.

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 (20 )=−11 .77+. 614𝑔𝑟𝑒

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 ( 45 )=1. 73+ .614𝑔𝑟𝑒

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540 (80 )=20 .63+. 614𝑔𝑟𝑒

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)


Conditional Regression Lines: Visualizing the multiple regression model


Small school,

Midsized school,

Large school,

Large schoolMidsized schoolSmall school

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22 .57+. 614𝑔𝑟𝑒+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑


From simple to multiple regression

Unit 6 /

kk XXXY ...22110

How does multiple regression help us?1. Simultaneous consideration of many contributing factors 2. We explain more of the variation in 3. Equivalently, more accurate predictions (smaller residuals)4. Provides a separate understanding of each predictor, accounting

for the effects of other predictors in the model (an attempt at holding the values of the other predictors constant)

5. Allows us to build models that can help support theories and hypotheses about causal and associative mechanisms.

More generally, let X1, X2, … Xk represent k predictors

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=− 40.52+.6907𝑔𝑟𝑒 �̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=313.45+.687𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=𝛽0+𝛽1𝑔𝑟𝑒+𝛽2𝑑𝑜𝑐𝑔𝑟𝑎𝑑


JasmineWhitney

Leah W.Nicole B.

Linda S.Nell W.

JeraulNazeemaHilary D.

Yujie D.Marc

Felicia B.

Jaime B.Michael H.Mark H.

LindsayEvan CSophia

PolinaColleen

MariahJ. Daniel L.

RaquelKatelyn A.

CarolineDemetra G.

Ruthie C.Chen S,.

Helen A.Jing J.

Anna F.Sarah R.

Mary C.Elizabeth P.Emily M.Rebecca S.

Chris K.Austin S.

Kate M.Mary C.

Beth S.Austin S.Anthony V.

John B.Julia G.

Kate N.Miguel S.Noemie

Scott A.Julianne V.

Chong-min F.

Beth M.Yan Y.

Meaghan S.Young S.

McCaila I.

Natalia O.Dean a.

Tiffany TCynthia P.

LaurelEmily F.Severin

JamesJosh B.

Sara M.Sara J.

MarikaJoseph T.

Liying S.Annie K.

Bob S.Tracy E.

Arianna B.Andrew B.

Rachel G.Kelly B.

Liz C.Claire Amanda b.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 5 10 15Frequency

Our Data: Peer Rating

Unit 6 /

If each of you were a school of education…With peer ratings, cohort sizes, and average GRE scores determined by your location in the 3D space of Larsen G08…


Meaghan S.

Miguel S. Kate N.Josh B. Julianne V.

Hilary D. Mary C. Beth M.Kate M. Liying S. Sophia Andrew B. PolinaSarah R. ColleenMarika Annie K. J. Rebecca S. Laurel Cynthia P.Jeraul Noemie Julia G. JasmineElizabeth P. Arianna B.

Scott A. Nell W. Felicia B.Sara J. Linda S. Mariah Claire Emily M. Demetra G.

Chong-min F.Austin S. John B. Natalia O.Emily F. Lindsay Bob S. Raquel Young S.Helen A. Ruthie C. Dean a.Leah W. Austin S. Jing J.Jaime B. Nicole B. Chen S,.Sara M. Kelly B. Severin NazeemaEvan C WhitneyMcCaila I. Tiffany T Amanda b.Katelyn A. Yujie D.Anthony V.Marc Anna F.

Michael H.Rachel G. Mark H. Tracy E. Joseph T.Daniel L.Mary C. James Chris K.Yan Y.Beth S. Caroline

Liz C.0

5010

015

0S

ize

of S

-030

Uni

vers

itie

s' g

rad

uatin

g c

lass

es

1 2 3 4 5 6 7Frequency

Our Data: Cohort Size

Unit 6 /


Marc

Linda S.JasmineKate N.SophiaMiguel S.Hilary D.Mark H.

PolinaJeraulLeah W.

Kate M.CarolineMichael H.

Chen S.WhitneyBeth S.

Mary C.Colleen

Nicole B.Nazeema

Liz C.Nell W.

Elizabeth P.Chris K.

Evan CLindsayJaime B.

Daniel L.Demetra G.Jing J.

Joseph T.Julianne V.

J. Felicia B.Josh B.Yujie D.

Anna F.Katelyn A.Sarah R.

Cynthia P.Tiffany TMcCaila I.Anthony V.

Rebecca S.Meaghan S.

Young S.James

Sara M.Dean a.

Natalia O.Yan Y.

RaquelHelen A.

Chong-min F.

Mary C.Emily M.Beth M.

Austin S.Austin S.Severin

Ruthie C.Scott A.

John B.Julia G.LaurelRachel G.

Claire Kelly B.Amanda b.

Tracy E.Bob S.

Liying S.NoemieAndrew B.Arianna B.Marika

Annie K.MariahSara J.Emily F.

400

500

600

700

800

Ave

rage

GR

E s

core

s fo

r S

-03

0 U

niv

ers

ities

' stu

den

ts

0 5 10 15Frequency

Our Data: Average GREs

Unit 6 /

Meaghan S.Beth M.

Cynthia P.

Julianne V.

Miguel S.

Marc

Jeraul

Liying S.

Evan C

J.

ClaireMarika

Sarah R.

Noemie

Laurel

MariahRaquel

Helen A.

Andrew B.Annie K.

Arianna B.

Felicia B.

Demetra G.Elizabeth P.

Emily F.

Emily M.

Sara J.

ColleenSophiaPolina

Hilary D.

Julia G.

Lindsay

Rebecca S.

Chong-min F.

Young S.Scott A.

John B.

Kate M.Mary C.

Josh B.

Kate N.

Jasmine

Ruthie C.

Austin S.

Dean a.

Yan Y.

Daniel L.

JamesTiffany T

Anthony V.

McCaila I.

Joseph T.

Anna F.

Nazeema

Chen S.

Yujie D.

Jing J.

Michael H.Mark H.

Katelyn A.

Nicole B.Whitney

Natalia O.

Caroline

Amanda b.Bob S.

Kelly B.Tracy E.Rachel G.

Liz C.

Linda S.Leah W.

Nell W.

Jaime B.

Severin

Austin S.

Sara M.

Chris K.

Mary C.Beth S.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 50 100 150Size of S-030 Universities' graduating classes

Our Data: Peer Rating on Cohort Size


This is clearly a weak, near zero relationship, far weaker than the empirical relationship between peer rating and cohort size.How can you visualize this in this classroom?

Meaghan S.Beth M.

Cynthia P.

Julianne V.

Miguel S.

Marc

Jeraul

Liying S.

Evan C

J.

Claire Marika

Sarah R.

Noemie

Laurel

MariahRaquelHelen A.

Andrew B.Annie K.Arianna B.

Felicia B.

Demetra G.Elizabeth P.

Emily F.

Emily M.

Sara J.

ColleenSophia

Polina

Hilary D.

Julia G.

Lindsay

Rebecca S.

Chong-min F.

Young S.Scott A.

John B.

Kate M.Mary C.

Josh B.

Kate N.

Jasmine

Ruthie C.

Austin S.

Dean a.

Yan Y.

Daniel L.

JamesTiffany T

Anthony V.

McCaila I.

Joseph T.

Anna F.

Nazeema

Chen S.

Yujie D.

Jing J.

Michael H.Mark H.

Katelyn A.

Nicole B.Whitney

Natalia O.

Caroline

Amanda b.Bob S.Kelly B.Tracy E.Rachel G.

Liz C.

Linda S.Leah W.

Nell W.

Jaime B.

Severin

Austin S.

Sara M.

Chris K.

Mary C.Beth S.

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

400 500 600 700 800Average GRE scores for S-030 Universities' students

Our Data: Peer Rating on Average GRE


This is a weak-to-moderate relationship, slightly stronger than the empirical relationship between peer rating and average GRE scores.How can you visualize this in this classroom?


Multiple Regression: Finding the best-fit (hyper)plane

Unit 6 /

Vertical Squared Residuals (OLS)

Multiple Regression on Our Data in Stata

• A 100-point increment in average GREs predicts a 47.7-point increment in peer ratings, assuming can be held constant.

• A 100-student increment in the size of the graduating doctoral cohort predicts a 10.8-point increment in peer ratings, assuming can be held constant.

• 48.07% of the variation in ratings is accounted for by the two predictor variables.• On your own and in discussion and consultation with your neighbors:

– Write down your name, and then estimate your PeerRate, DocGrad, and GRE values from your location in this room.

– Use the prediction equation above and calculate your (your predicted peer rating, your fitted value).

– Write down your residual. How can you interpret this residual?© Andrew Ho, Harvard Graduate School of Education Unit 6 /

_cons 73.79312 36.95902 2.00 0.049 .2133642 147.3729 ourgre .4765136 .0560894 8.50 0.000 .3648481 .5881791 oursize .1080475 .1624824 0.66 0.508 -.21543 .4315251 ourrate Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 286050 80 3575.625 Root MSE = 43.64 Adj R-squared = 0.4674 Residual 148544.887 78 1904.42162 R-squared = 0.4807 Model 137505.113 2 68752.5567 Prob > F = 0.0000 F( 2, 78) = 36.10 Source SS df MS Number of obs = 81

. regress ourrate oursize ourgre

�̂�𝑢𝑟𝑟𝑎𝑡𝑒=73.79+.108𝑜𝑢𝑟𝑠𝑖𝑧𝑒+ .477𝑜𝑢𝑟𝑔𝑟𝑒

Meaghan S.

Miguel S. Kate N.Josh B. Julianne V.

Hilary D. Mary C. Beth M.Kate M. Liying S. Sophia Andrew B. PolinaSarah R. ColleenMarika Annie K. J. Rebecca S. Laurel Cynthia P.Jeraul Noemie Julia G. JasmineElizabeth P. Arianna B.

Scott A. Nell W. Felicia B.Sara J. Linda S. Mariah Claire Emily M. Demetra G.

Chong-min F.Austin S. John B. Natalia O.Emily F. Lindsay Bob S. Raquel Young S.Helen A. Ruthie C. Dean a.Leah W. Austin S. Jing J.Jaime B. Nicole B. Chen S,.Sara M. Kelly B. Severin NazeemaEvan C WhitneyMcCaila I. Tiffany T Amanda b.Katelyn A. Yujie D.Anthony V.Marc Anna F.

Michael H.Rachel G. Mark H. Tracy E. Joseph T.Daniel L.Mary C. James Chris K.Yan Y.Beth S. Caroline

Liz C.

050

100

150

Siz

e o

f S-0

30 U

nive

rsiti

es'

gra

dua

ting

cla

sses

1 2 3 4 5 6 7Frequency

Constructing a conditional regression plot

• We can plot peerrate on gre with different docgrad lines, or we can plot peerrate on docgrad with different gre lines.

• Generally, we place the primary predictor of interest on the axis (let’s say gre) and the “control” predictor or “covariate” on the legend.

• With our data, we can pick prototypical oursize values of 20, 60, and 100.


300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)


Large (80)Midsized (45)Small (20)

- Select 2-5 prototypical values - Substantively interesting values - A range of percentiles (10 or 25, 50, 75 or 90) - The sample mean ± .5 or 1 standard deviation - Easily communicated values, whole numbers

or fractions. - When the variable is already ordinal (ordered

categories, e.g., coded 0, 1, and 2 for small, medium, and large), so much the better

250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

400 500 600 700 800Average GRE scores for S-030 Universities' students

Conditional regression lines for our data


Small schools (o

ursize = 20)*

Large schools (oursiz

e = 100)*

* Differences between conditional regression lines are not statistically significant.




250

300

350

400

450

Pee

r R

atin

g fo

r S

-030

Uni

vers

ities

0 50 100 150Size of S-030 Universities' graduating classes

Conditional regression lines for our data


High GRE Schools (ourgre = 700)*

* Slopes of conditional regression lines are not statistically significant (cannot be distinguished from 0).




Mid-Range GRE Schools (ourgre = 600)*

Low GRE Schools (ourgre = 500)*

• We imagine that the plot extends into the slide along a third axis, docgrad, where larger schools are deeper and smaller schools are closer (or vice versa).

• The regression model fits the best fit *plane* through this scatterplot in three dimensional space, minimizing the vertical squared residuals between each point and the plane.

• Instead of a single regression line of peerrate on gre, we have a regression line for every level of docgrad, extending along the plane.

• Note that the lines are parallel, and they must be given our regression model: We loosen this assumption in Unit 10.

Interpreting the conditional regression plot for empirical data


300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)


Large (80)Midsized (45)Small (20)

60

53

38

193

22

10

43

61

38106

112

89

39

52

110

102

37

50

119 42

61 66 96

71

51 65

141

33

52

64

76

68

53

86

67

86 70

38

27

17

32

1224

21

4

30

55

18

3620

63

16

23

40

15

19

59

11

46

42

45

34

4

51

6

8

24

29 21

29

586010

36

28

41 27

18

28

30

15

19

1333 18

21

18

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)


60

53

38

193

22

10

43

61

38106

112

89

39

52

110

102

37

50

119 42

61 66 96

71

51 65

141

33

52

64

76

68

53

86

67

86 70

38

27

17

32

1224

21

4

30

55

18

3620

63

16

23

40

15

19

59

11

46

42

45

34

4

51

6

8

24

29 21

29

586010

36

28

41 27

18

28

30

15

19

1333 18

21

18

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)


Interpreting a conditional regression plot


Large (80)

Midsized (45)

Small (20)

Adding the other predictor improves prediction, reduces residuals, and reduces residual variance, but residual variance obviously remains.

The magnitude of the axis predictor can be seen as the slope. The magnitude of the legend predictor can be seen as the spacing between the lines.

300

350

400

450

500

Mea

n pe

er r

atin

g by

Dea

ns (

100-

500)


Illustrative contrasts in regression lines


Prediction of peer rating by average GRE scores for large schools (docgrad = 80)

Prediction of peer rating by average GRE scores for small schools (docgrad = 20)

Prediction of peer rating by average GRE scores for the average-size schools (docgrad = 45.68). The conditional regression line.

Prediction of peer rating by average GRE scores without accounting for school size. The unconditional regression line.This is not the prediction for an average-sized school.This is the prediction if we knew nothing about size!


StJohnsIllStateAuburnClaremont KentuckyIndianaStSDState UWMilw UNewMexUCincinnati

UCRiverside UCIrvine UNCG Colorado TexasA&M Utah UCDavis WashStateUSCarolina Oklahoma UNCRaleigh USF PittIllinoisChicago Missouri Uarizona UVM Kansas IowaState BYUArizonaState Syracuse VAComm OhioState Umass TempleBaylor SUNYBuffalo Uhouston FordhamGeorgia WmandMary LSU Tennessee UtahStateUCSB GWU BUGeorgeMason Cornell SUNYAlbany FSU Maryland ChapelHillUSC Lehigh PennStateRutgers MinneTC UHawaii Umiami Uva Purdue UCLAMadison BCMichiganState UTAustin UconnBoulder Florida Hofstra Washington NYU IndianaIowaPenn TC BerkeleyMichigan Oregon

Urbana

DelawareVanderbilt Harvard

Northwestern Stanford

450

500

550

600

650

700

Ave

rag

e m

ean

verb

al a

nd q

uant

itativ

e G

RE

sco

res

0 2 4 6 8Frequency

Another perspective on the implications of the same model…

Unit 6 /

Let’s pick a low-GRE school of 500, a midrange GRE school of 550, and a high GRE school of 600.

99% 677.5 677.5 Kurtosis 3.56567595% 654 677 Skewness .652933890% 605 662.5 Variance 1840.3575% 584.5 660.5 Largest Std. Dev. 42.899350% 550.5 Mean 557.8966

25% 527.5 489.5 Sum of Wgt. 8710% 504 484 Obs 87 5% 494.5 478.5 1% 474.5 474.5 Percentiles Smallest Average mean verbal and quantitative GRE scores

. summarize gre, detail

. dotplot gre, mlabel(school) mlabsize(vsmall)

300

350

400

450

500

Me

an p

eer

ratin

g b

y D

eans

(10

0-5

00)


Another perspective on conditional regression lines


Low-scoring school,

Midrange school,

High-scoring school,

High-scoring school (600)

Midrange school (550)

Low-scoring school (500)

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (500)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=284.43+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑


�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (550)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=315.13+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑

�̂�𝑒𝑒𝑟𝑟𝑎𝑡𝑒=−22.57+.614 (600)+.540𝑑𝑜𝑐𝑔𝑟𝑎𝑑=345.83+ .540𝑑𝑜𝑐𝑔𝑟𝑎𝑑• A 10-person increment in the

size of the doctoral cohort predicts a 5.4-point increment in the predicted mean peer rating by deans, assuming average GREs can be held constant.

• An 50-point increment in average GRE scores predicts a 30.7-point increment in the predicted mean peer rating by deans, assuming the size of the doctoral cohort can be held constant.

MST=SST/dfSSTn – 1

MSE=SSE/dfSSEn – k – 1

MSR=SSR/dfSSRk ~ #predictors

Mean Squaredf

Analysis of Variance

Total

Residual(Error)

Model (Regression)

Sum of SquaresSource

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

- MS stands for Mean Square, an average sum of squares, like a variance.- In fact, MS Total *is* your variance, the unconditional variance of the peerrate variable, the square of the standard deviation: - MS Model, the model mean square, is a measure of the variation accounted for by the model (not easily interpretable on the scale). The bigger the better.- MS Residual, your residual variance, is interpretable as error variance ()



. regress peerrate gre docgradANalysis Of VAriance (ANOVA) regression decomposition for multiple regression

Unit 6 / © Andrew Ho, Harvard Graduate School of Education

Everything to this point is the same as

simple linear regression, except that we are paying more attention to the mean squares.

The distribution, a sampling distribution for variance ratios





Mean Squaredf


Total

Residual(Error)

Model (Regression)

Sum of SquaresSourceMSE

MSRF

- If we sample two variances (MSR and MSE) from two populations with equal variances, and if we take repeated ratios (MSR/MSE), sometimes the numerator (MSR) will be larger, and sometimes the denominator (MSE) will be larger.- The distribution will be loosely centered on 1 and will always be positive.- If one variance is craaaaazy bigger than another, we might conclude that our null hypothesis of equal population variances is incorrect and accept the alternative:

unequal population variances.

This is the sample size from the first distribution (-1)

This is the sample size from the second

distribution (-1)

F distribution: http://www.capdm.com/demos/software/html/capdm/qm/fdist/usage.html Sampling distributions: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

http://www.capdm.com/demos/software/html/capdm/qm/fdist/usage.html


http://onlinestatbook.com/stat_sim/sampling_dist/index.html

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

The omnibus test for multiple regression





Mean Squaredf


Total

Residual(Error)

Model (Regression)

Sum of SquaresSource MSE

MSRF

- This particular statistic represents variance accounted for by the regression model over error variance (unaccounted for).- In other words, good variance over bad variance. We want F to be as large as possible.- Under the null hypothesis, there is no predictive value to *any* of your predictors in the population: - If the model variance is sufficiently greater than the error variance, we can reject

- We accept One or more population slopes are nonzero.

This is , the number of predictors

This is , a quantity that increases with

sample size

0...: 210 H

0H:aH

MSR and MSE are scaled (by dividing by ) such that, under the null hypothesis, they act like sample variances from populations with equal variance.

2)ˆ( YYSSR i

2)ˆ( ii YYSSE

2)( ii YYSST

• Null hypothesis: The predictors account for no Y variance in the population. The predictors have no predictive utility in the population.

• Test statistic: . Ratio of regression model variance to error variance. Variance accounted for over variance unaccounted for.

• Decision rule: If , reject in favor of : one or more population slopes are nonzero; the model has some predictive utility, some

The omnibus test


1.001.301.561.72

1.221.381.641.77

1.571.681.782.01

1.831.932.032.24

2.212.312.402.60

2.372.462.562.76

2.602.702.792.99

3.003.093.183.39

3.843.944.034.24

inf1005025

for denominator (MSE)

1000

120

20

10

5

4

3

2

1

for numerator (MSR)

Critical values of (α=.05)

0...: 210 H

Stata returns critical values as: display invFtail(k, n-k-1, .05)

Here indexes one of predictor slopes.




k ~ number of predictors

n-k-1

The probability of sampling an ratio of 58.52 or larger under the null hypothesis of no predictive utility is 0.0000… very, very low, so we reject the null hypothesis and conclude that one or more predictors account for some variance in the population.



The omnibus F test vs. slope tests

Omnibus F testAcross all of my predictors, is regression

helping at all in the population?

Unit 6 /




0...: 210 H 0 some: jaH

In the population, regression does not

help at all

In the population, this set of predictors has

some predictive utility

t-tests for slopesIn the population, does this variable help prediction

above and beyond other included variables?

0:0 jH 0: jaH

In the population, this predictor has no effect when accounting for

other predictors in the model.

In the population, this predictor has an effect when accounting for

other predictors in the model.

Three equivalent tests in simple linear regression


Let’s revisit a simple linear regression model analogous to one you’re considering in Assignment #3: Math Achievement (TIMSS) on the log of income (per capita GDP) for countries.

Test 1: The omnibus testEvaluates whether the any of the populations slopes are nonzero.

With only one slope…

Test 2: The -test for slopeEvaluates whether a particular population slope is nonzero…

Fun fact 1: When there is only one predictor (that is, the degrees of freedom in the numerator is 1), then , and the -values from the two tests will be identical (.0315).

Test 3: The -test for correlation (pwcorr command)Evaluates whether a population correlation is nonzero.

With a single predictor, the significance of the omnibus prediction, the significance of a single slope, and the significance of an association are identical.

With k > 1 predictor, this equivalence does not hold.

Fun fact 2: When k = 1, R-sq =


Summarizing regression output from multiple models: estoutFind and install the estout package with the following command:

findit estoutand scroll past the search results to the third listed package under the Web Resources heading. Click the

link, then click the helpful link to the right, labeled, “Click here to install.”As always, you will begin by conducting a thorough review of the univariate and bivariate distributions

and statistics, exploring various models and diagnostics, and considering remedial options.At a certain point, you may consider saving promising, helpful, illustrative or otherwise benchmark-

setting models for your review. This can be done with the estout packageDocumentation: http://repec.org/bocode/e/estout/index.html

Unit 6 /

* p<0.05, ** p<0.01, *** p<0.001t statistics in parentheses N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) docgrad 0.687*** 0.540***

(8.02) (8.14) gre 0.691*** 0.614*** peerrate peerrate peerrate (1) (2) (3)

http://repec.org/bocode/e/estout/index.html


Important esttab options

Unit 6 /

* p<0.05, ** p<0.01, *** p<0.001PhD Size is the number of conferred PhDs in 2004t statistics in parentheses df_r 85 85 84 df_m 1 1 2 F 64.40 28.75 58.52 adj. R-sq 0.424 0.244 0.572 R-sq 0.431 0.253 0.582 N 87 87 87 (-0.84) (43.49) (-0.54) _cons -40.52 313.5*** -22.57

(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006 Add a descriptive title and short

but descriptive coeflabels.

Descriptive model titles (mtitles) should be added for reference and numbers suppressed (nonumbers).

With many models, you may need the compress option, although this will limit the length of your coeflabels and mtitles.

APA guidelines require that figures and tables be self-contained, so a table note (addnote) can be useful.

Finally, note our desired statistics, R-sq, adj-R-sq, F, and our degrees of freedom for the model and for the residual.

Other esttab features to note

• If you store too many models with eststo, clear its memory with eststo clear. Then you’ll have use eststo quietly regress … to store again from zero.

• Never forget the importance of help, e.g., help esttab• When presenting tables, it is generally more common to include standard errors, not t statistics, in

parentheses. To do this, use the se option.• Likewise, for parsimony, the F statistic and degrees of freedom are not commonly reported,

although they can be helpful for model selection.• The R-sq is a necessity. The adjusted R-sq helps in model selection (you’ll soon see why), but it is

less commonly reported unless model selection is the focus.© Andrew Ho, Harvard Graduate School of Education Unit 6 /


(5.36) (5.51) PhD Size 0.687*** 0.540***

(8.02) (8.14) Avg GRE 0.691*** 0.614*** Model A Model B Model C Predicting US News & World Report Peer Ratings, 2006


Reporting results for the multiple regression model

Reporting statistical significance:The two-predictor model with

average GRE score and the size of the doctoral cohort accounts for 58.2% of the variation in peer ratings.

The omnibus null hypothesis of no predictive utility can be rejected F(2,84)=58.52, p<.001.

Unit 6 /


(5.36) (5.51) PhD Size 0.687*** 0.540***


(The omnibus p-value is not shown in this table and must be gathered from the original regression output as Prob>F.)

The coefficient for the average GRE score is statistically significant, t(84)=8.14, p<.001. The model suggests that an increment of 10 points in average GRE score is associated with an increment of 6.14 points on the peer rating scale, if the size of the doctoral cohort could be held constant.

The coefficient for the size of the doctoral cohort is statistically significant, t(84)=5.51, p<.001. The model implies that an increment of 10 students in doctoral degree conferral predicts an increment of 5.40 points on the peer rating scale, accounting for the average GRE score of the students.

Beginning to look across the models



(5.36) (5.51) PhD Size 0.687*** 0.540***


The average GRE score alone accounts for 43.1% of the variance in peer ratings, and the size of the graduating doctoral cohort alone accounts for 25.3% of the variance in peer ratings. Together, these two variables account for 58.2% of the variance in peer ratings. Note that 43.1% + 25.3% = 68.4% of the variance, yet only 58.2% of the variance is accounted for by both variables. Why the difference? The percentages will sum tidily if and only if the predictors have zero correlation with each other, a veritable impossibility with real data outside of a coding error.

Each predictor’s estimated coefficient decreases in magnitude when the other predictor is included in the model. However, the signs are unchanged, the decline does not appear to be substantively significant, and the statistical significance of the two predictors remains unchanged. Our inferences about the prediction of peer ratings are reasonably robust across the model forms and compositions shown in this table. Given that our interpretations are robust to model choice, and the two-predictor model is substantively defensible, we opt for the model that supports the best predictions.


Diagnostic plots for multiple regression

Unit 6 /

Our regression assumptions haven’t changed: independent, normally distributed residuals with equal variance centered on 0. Our plots remain the same, as well: residuals vs. fitted values (rvfplot), leverage vs. discrepancy (lvr2plot), and Cook’s distance, to name a few.

Stanford

VanderbiltBerkeley NYU

Delaware

0.0

5.1

.15

.2.2

5C

ook'

s di

stan

ce


Harvard

UCLA

Stanford

TC

Vanderbilt

Northwestern

BerkeleyPennMichigan

MadisonNYU

MinneTC Oregon

MichiganState

Indiana

UTAustin

Washington

Urbana

USC

BC OhioStateMaryland

Uva

GWUFloridaIowa

Georgia

ChapelHillUconnKansas TemplePittTennessee

PennStateArizonaState

TexasA&M

MissouriUNCGUtahState

Delaware

Rutgers

Boulder

IllinoisChicagoWmandMary

Cornell

UCSBUmassLehigh

Purdue SyracuseFSUOklahomaBU

Claremont

UmiamiSDState

UarizonaUHawaiiUSF

SUNYAlbany

UNewMexUSCarolina

VAComm

SUNYBuffalo

UCDavisUCIrvine

FordhamUtahGeorgeMasonIowaState UhoustonUNCRaleigh

UCRiverside UCincinnatiWashState

Auburn

BaylorBYU

LSU

KentuckyColorado

UWMilw Hofstra

IllStateIndianaSt

StJohns

UVM

0.0

5.1

.15

.2.2

5Le

vera

ge

0 .02 .04 .06 .08Normalized residual squared

Delaware

-3-2

-10

12

Sta

ndar

dize

d/S

tude

ntiz

ed R

esid

ual


We speak now in terms of discrepancy from, leverage on, and influence upon

the estimated regression plane.


Midterm Checkin

• Assignment #3 will be passed back by email over the next 24 hours.• Assignment #4 has been posted, partners mandatory.

– Notify me by Friday if you do not have a partner. No exceptions.– https://docs.google.com/spreadsheet/ccc?key=0AuXHUPzC9fGPdGdSc0VacnFzUnI5YmswbkoxNGE3YWc#gid=0

• Partnerships take work. Open communication. Constant upkeep.• Attendance. Come to class regularly.

• Resources: Lectures, videos, sections, partnerships, study groups, my office hours.

• Course Philosophies: Disciplined Perception. Layering. Language. Collaboration. Sensitivity. Statistical vs. Substantive. Exploratory Analysis. Unit 6 /

https://docs.google.com/spreadsheet/ccc?key=0AuXHUPzC9fGPdGdSc0VacnFzUnI5YmswbkoxNGE3YWc

https://docs.google.com/spreadsheet/ccc?key=0AuXHUPzC9fGPdGdSc0VacnFzUnI5YmswbkoxNGE3YWc


• Multiple regression holds several advantages over single-predictor regression• Simultaneous consideration of multiple correlated predictors• Improved prediction of outcomes• Parsimony in model expression• Support of inferences like, “the effect of X while accounting for Z”• More control over models consistent with our theories and

hypotheses• Construction and conceptualization of 2D and 3D representations of

two-predictor models in terms of best-fit planes• ANOVA decomposition and R-sq for multiple regression• The omnibus F test vs. t-tests for slopes• The estout package and the tabular juxtaposition of models• Regression diagnostics in multiple regression

What are the takeaways from this unit?

Unit 6 /

unit 6: the basics of multiple regression class 14… class 15…class 14…class 15…...

Documents

multiple regression

starting point andrew

docgrad variable andrew

graph matrix andrew

best fit plane andrew

plane point

bestfit regression lines

student increment