inferential methods in regression and correlationxuanyaoh/stat350/xyapr13lec28.pdf · chapter 11...
TRANSCRIPT
Inferential Methods in Regression and Correlation
Chapter 11
Lecture 27 4/15/12 1
Announcements
• Make-up for Final Exam
– fill out the form TWO WEEKS AHEAD of time (by Friday, Apr 20th),
– scheduled ahead of time, – taken at the testing center…
4/15/12 Lecture 27 2
Back to Ch.3 (Linear Regression):
• Recall Simple Linear Regression: – Visualization, by scatterplot of response vs. predictor – Fit a line in the data when you see a linear trend – Minimizing the errors using LS method – Get estimates of slope and intercept accordingly – Random residuals
• In this chapter, we introduce – The normal error regression model – Considering “regression” as a sample that we want to draw
inference on – Backward elimination to choose the “effective” variables in
MLR.
Lecture 27 4/15/12 3
Motivation
• Remember when we used to estimate µ? • What did we do?
– Used confidence intervals to give a guess where µ falls
– Used hypothesis testing to check specific hypotheses for µ
• Treat the regression line similarly – Need to understand the sample distribution
again!
X
Lecture 27 4/15/12 4
Normal Error Regression Model
Lecture 27 4/15/12 5
Normal Error Regression Model
Lecture 27 4/15/12 6
A Toy Example • Suppose we use Age to predict Blood Pressure
– Which is X? Y? • Draw a picture… (for the bivariate data: Age and BP)
• For any fixed x, the dependent y has a normal distribution – The mean of y falls on the “population regression line” – P(3< y<5) = ? Using the Normal Error Regression Model…
• Another way to say the same thing is just: ei ~ N(0, σ)
Lecture 27 4/15/12 7
Estimating the slope and intercept
Lecture 27 4/15/12 8
Why This Model?
Lecture 27 4/15/12 9
Only estimates!!!
• Of course, these are only sample estimates – If we took a different sample, we would get different estimates – Need to use these estimates a and b to draw inference about the
“real” slope and intercept, α and β
• Need to know the sampling distribution… – No problem. – We already know it’s normal, the important statement is this
again: ei ~ N(0, σ)
• Meanings of a and b:
– Intercept: a, the value of predicted y when x = 0. – Slope: b, the amount of increase (or change) in y when 1 unit
increase in x.
Lecture 27 4/15/12 10
Estimating the error variance • From the model, we know that ei ~ N(0, σ) • To estimate σ we use the residuals • After the slope and intercept is estimated, the
residuals are calculated as:
• SSE is calculated as: • It is used to estimate σ by:
• Notice: n – 2 is the df for SSE. Why n – 2?
2 2ˆ( )i ie y y= −∑ ∑)
MSEnSSEse =−
=2
Lecture 27 4/15/12 11
Sampling distribution of the slope, b
Lecture 27 4/15/12 12
Confidence interval for β • The interval is:
b ± (t crit) * sb ,
Lecture 27 4/15/12 13
Testing Hypotheses for β • H0: β = β0 • Ha: β ≠ β0 • Notice:
• Test statistics is: , based on t distribution with df = n – 2
• Usually want to test H0: β = 0 – Why?
*
b
btsβ−
=
Lecture 27 4/15/12 14
Example 1
Lecture 27 4/15/12 15
Testing Linear Relationship in the SLR
Chapter 11
Lecture 27 4/15/12 16
Example 2 —Height and Weight
• The following data set gives the average heights and weights for American women aged 30-39 (source: The World Almanac and Book of Facts, 1975). Total observations 15.
Lecture 27 4/15/12 17
Example—Height and Weight
Lecture 27 4/15/12 18
SAS output—Height and Weight
Lecture 27 4/15/12 19
Using the SAS output for inference
• Construct a 95% confidence interval for β
• Test whether or not there is a significant linear relationship (H0: β = 0)
Lecture 27 4/15/12 20
SAS Code
proc reg data=example; model weight = height / alpha=0.05 clb; plot weight*height; run;
• Note the ‘clb’ option will produce a
(1-0.05) = 95% confidence interval for b in addition to the hypothesis test
Lecture 27 4/15/12 21
Review: Coefficient of Determination
Lecture 27
• Also, just the square of the correlation, r. • Multiplying r2 by 100 gives the percent of variation attributed to the linear regression between Y and X. Percent of variation explained. • Notice when SSR is large (or SSE small), we have explained a large amount of the variation in Y
4/15/12 22
Review: Standard Deviation about the regression line
• Given by:
n – 2 comes from the degrees of freedom! • Roughly speaking
– It is the typical amount by which an observation varies about the regression line
• Also called “root MSE” or the square root of the Mean Square Error
2−=nSSEse
Lecture 27 4/15/12 23
SAS Code
proc reg data=example; model weight = height / alpha=0.05 clb;
àoutput out=fit r=res p=pred; plot weight*height;
run; • This creates a new dataset called “fit” • It contains:
– ei, residuals in a variable called “res” – , predicted values in a variable called “pred”
iyLecture 27 4/15/12 24
Review: Residual Plots
• The residuals can be used to assess the appropriateness of a linear regression model.
• Specifically, a residual plot, plotting the residuals against x gives a good indication of whether the model is working – The residual plot should not have any pattern but
should show a random scattering of points – If a pattern is observed, the linear regression
model is probably not appropriate.
Lecture 27 4/15/12 25
Examples—good
Lecture 27 4/15/12 26
Examples— linearity violation
Lecture 27 4/15/12 27
Examples— constant variance violation
Lecture 27 4/15/12 28
Tests of Linear Relationship • H0: β = 0 ó H0: ρ = 0 • Ha: β ≠ 0 Ha: ρ ≠ 0
• Test statistics is:
• Where r is the sample correlation coefficient • Still based on t distribution with df = n – 2
• Remark: A third way to test the linear relationship – construct the confidence interval for β.
212r
nrt−
−=
Lecture 27 4/15/12 29
Test of Linear Relationship, using ρ (optional)
Lecture 27 4/15/12 30
Example 3
Lecture 27 4/15/12 31
ANOVA Approach to Linear Regression
Lecture 27 4/15/12 32
ANOVA Table Source DF SS MS
Model (Regression)
1 SSM = (or SSR)
SSM/1 = MSM (or
MSR) Error n – 2 SSE =
(or SSResid) SSE/n – 2
= MSE
Total n – 1 SST = SSM + SSE =
Lecture 27 4/15/12 33
ANOVA approach to Linear Regression
Lecture 27 4/15/12 34
Revisit Example 3
Lecture 27 4/15/12 35
SAS codes and Output
Lecture 27 4/15/12 36
After Class…
• Review today’s Lecture Notes, Section 11.1 (optional part: “Exponential Regression”) and 11.2
• Read Section 11.4.
• Lab #7, Wed, Apr 18th. Try the template codes at first … • Hw#11, Monday after next week.
Lecture 27 4/15/12 37