inferential methods in regression and correlationxuanyaoh/stat350/xyapr13lec28.pdf · chapter 11...

Inferential Methods in Regression and Correlation

Chapter 11

Lecture 27 4/15/12 1

Announcements

•  Make-up for Final Exam

–  fill out the form TWO WEEKS AHEAD of time (by Friday, Apr 20th),

– scheduled ahead of time, –  taken at the testing center…

4/15/12 Lecture 27 2

Back to Ch.3 (Linear Regression):

•  Recall Simple Linear Regression: –  Visualization, by scatterplot of response vs. predictor –  Fit a line in the data when you see a linear trend –  Minimizing the errors using LS method –  Get estimates of slope and intercept accordingly –  Random residuals

•  In this chapter, we introduce –  The normal error regression model –  Considering “regression” as a sample that we want to draw

inference on –  Backward elimination to choose the “effective” variables in

MLR.

Lecture 27 4/15/12 3

Motivation

•  Remember when we used to estimate µ? •  What did we do?

– Used confidence intervals to give a guess where µ falls

– Used hypothesis testing to check specific hypotheses for µ

•  Treat the regression line similarly – Need to understand the sample distribution

again!

X

Lecture 27 4/15/12 4

Normal Error Regression Model

Lecture 27 4/15/12 5

Normal Error Regression Model

Lecture 27 4/15/12 6

A Toy Example •  Suppose we use Age to predict Blood Pressure

–  Which is X? Y? •  Draw a picture… (for the bivariate data: Age and BP)

•  For any fixed x, the dependent y has a normal distribution –  The mean of y falls on the “population regression line” –  P(3< y<5) = ? Using the Normal Error Regression Model…

•  Another way to say the same thing is just: ei ~ N(0, σ)

Lecture 27 4/15/12 7

Estimating the slope and intercept

Lecture 27 4/15/12 8

Why This Model?

Lecture 27 4/15/12 9

Only estimates!!!

•  Of course, these are only sample estimates –  If we took a different sample, we would get different estimates –  Need to use these estimates a and b to draw inference about the

“real” slope and intercept, α and β

•  Need to know the sampling distribution… –  No problem. –  We already know it’s normal, the important statement is this

again: ei ~ N(0, σ)

•  Meanings of a and b:

–  Intercept: a, the value of predicted y when x = 0. –  Slope: b, the amount of increase (or change) in y when 1 unit

increase in x.

Lecture 27 4/15/12 10

Estimating the error variance •  From the model, we know that ei ~ N(0, σ) •  To estimate σ we use the residuals •  After the slope and intercept is estimated, the

residuals are calculated as:

•  SSE is calculated as: •  It is used to estimate σ by:

•  Notice: n – 2 is the df for SSE. Why n – 2?

2 2ˆ( )i ie y y= −∑ ∑)

MSEnSSEse =−

=2

Lecture 27 4/15/12 11

Sampling distribution of the slope, b

Lecture 27 4/15/12 12

Confidence interval for β •  The interval is:

b ± (t crit) * sb ,

Lecture 27 4/15/12 13

Testing Hypotheses for β •  H0: β = β0 •  Ha: β ≠ β0 •  Notice:

•  Test statistics is: , based on t distribution with df = n – 2

•  Usually want to test H0: β = 0 –  Why?

*

b

btsβ−

=

Lecture 27 4/15/12 14

Example 1

Lecture 27 4/15/12 15

Testing Linear Relationship in the SLR

Chapter 11

Lecture 27 4/15/12 16

Example 2 —Height and Weight

•  The following data set gives the average heights and weights for American women aged 30-39 (source: The World Almanac and Book of Facts, 1975). Total observations 15.

Lecture 27 4/15/12 17

Example—Height and Weight

Lecture 27 4/15/12 18

SAS output—Height and Weight

Lecture 27 4/15/12 19

Using the SAS output for inference

•  Construct a 95% confidence interval for β

•  Test whether or not there is a significant linear relationship (H0: β = 0)

Lecture 27 4/15/12 20

SAS Code

proc reg data=example; model weight = height / alpha=0.05 clb; plot weight*height; run;

•  Note the ‘clb’ option will produce a

(1-0.05) = 95% confidence interval for b in addition to the hypothesis test

Lecture 27 4/15/12 21

Review: Coefficient of Determination

Lecture 27

•  Also, just the square of the correlation, r. • Multiplying r2 by 100 gives the percent of variation attributed to the linear regression between Y and X. Percent of variation explained. •  Notice when SSR is large (or SSE small), we have explained a large amount of the variation in Y

4/15/12 22

Review: Standard Deviation about the regression line

•  Given by:

n – 2 comes from the degrees of freedom! •  Roughly speaking

– It is the typical amount by which an observation varies about the regression line

•  Also called “root MSE” or the square root of the Mean Square Error

2−=nSSEse

Lecture 27 4/15/12 23

SAS Code

proc reg data=example; model weight = height / alpha=0.05 clb;

àoutput out=fit r=res p=pred; plot weight*height;

run; •  This creates a new dataset called “fit” •  It contains:

–  ei, residuals in a variable called “res” –  , predicted values in a variable called “pred”

iyLecture 27 4/15/12 24

Review: Residual Plots

•  The residuals can be used to assess the appropriateness of a linear regression model.

•  Specifically, a residual plot, plotting the residuals against x gives a good indication of whether the model is working – The residual plot should not have any pattern but

should show a random scattering of points –  If a pattern is observed, the linear regression

model is probably not appropriate.

Lecture 27 4/15/12 25

Examples—good

Lecture 27 4/15/12 26

Examples— linearity violation

Lecture 27 4/15/12 27

Examples— constant variance violation

Lecture 27 4/15/12 28

Tests of Linear Relationship •  H0: β = 0 ó H0: ρ = 0 •  Ha: β ≠ 0 Ha: ρ ≠ 0

•  Test statistics is:

•  Where r is the sample correlation coefficient •  Still based on t distribution with df = n – 2

•  Remark: A third way to test the linear relationship –  construct the confidence interval for β.

212r

nrt−

−=

Lecture 27 4/15/12 29

Test of Linear Relationship, using ρ (optional)

Lecture 27 4/15/12 30

Example 3

Lecture 27 4/15/12 31

ANOVA Approach to Linear Regression

Lecture 27 4/15/12 32

ANOVA Table Source DF SS MS

Model (Regression)

1 SSM = (or SSR)

SSM/1 = MSM (or

MSR) Error n – 2 SSE =

(or SSResid) SSE/n – 2

= MSE

Total n – 1 SST = SSM + SSE =

Lecture 27 4/15/12 33

ANOVA approach to Linear Regression

Lecture 27 4/15/12 34

Revisit Example 3

Lecture 27 4/15/12 35

SAS codes and Output

Lecture 27 4/15/12 36

After Class…

•  Review today’s Lecture Notes, Section 11.1 (optional part: “Exponential Regression”) and 11.2

•  Read Section 11.4.

•  Lab #7, Wed, Apr 18th. Try the template codes at first … •  Hw#11, Monday after next week.

Lecture 27 4/15/12 37

inferential methods in regression and correlationxuanyaoh/stat350/xyapr13lec28.pdf · chapter 11...

Documents