engr 610 applied statistics fall 2007 - week 11

27
ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith

Upload: victoria-odonnell

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

ENGR 610 Applied Statistics Fall 2007 - Week 11. Marshall University CITE Jack Smith. Overview for Today. Review Simple Linear Regression , Ch 12 Go over problem 12.56 Multiple Linear Regression , Ch 13 (1-5) Multiple explanatory variables Coefficient of multiple determination - PowerPoint PPT Presentation

TRANSCRIPT

ENGR 610Applied Statistics

Fall 2007 - Week 11

Marshall University

CITE

Jack Smith

Overview for Today Review Simple Linear Regression, Ch 12 Go over problem 12.56 Multiple Linear Regression, Ch 13 (1-5)

Multiple explanatory variables Coefficient of multiple determination Adjusted R2

Residue Analysis F-test t test and confidence interval for slope Partial F-tests for each individual contributions Coefficients of partial determination

Homework assignment

Regression Modeling Analysis of variance to “fit” a predictive model for a

response (dependent) variable to a set of one or more explanatory (independent) variables

Minimize residual error w.r.t. linear coefficients Interpolative over relevant range - do not extrapolative Typically linear, but may be curvilinear or more

complex (w.r.t. independent variables) Related to Correlation Analysis - measuring the

strength of association between variables Regression is about variance in the response variable Correlation is about co-variance - symmetric

Types of Regression Models

Based on Scatter Plots Y vs X Dependent vs independent

Linear Models Positive, negative or no slope Zero or non-zero intercept

Curvilinear Models Positive, negative or no “slope” Positive, negative or varied curvature May be U shaped, with extrema May be asymptotically or piece-wise linear May be polynomial, exponential, inverse,…

Least-Square Linear Regression

Simple Linear Model (for population) Yi = 0 + 1Xi + i

Xi = value of independent variable Yi = observed value of dependent variable 0 = Y-intercept (Y at X=0) 1 = slope (Y/X) i = random error for observation i

Yi’ = b0 + b1Xi (predicted value) b0 and b1 are called regression coefficients ei = Yi - Yi’ (residual) Minimize ei

2 for sample with respect to b0 and b1

Partitioning of Variation Total variation

Regression variation

Random variation

SST (Yi

i1

n

Y )2

Y 1

nYi

i1

n

(Mean response)

SSR ( ˆ Y ii1

n

Y )2

SSE (Yi

i1

n

ˆ Y i)2

SST = SSR + SSE

Coefficient of Determinationr2 = SSR/SST

Standard Error of the Estimate

SXY SSE

n 2

Partitioning of Variation - Graphically

Assumptions of Regression (and Correlation) Normality of error

about regression line Homoscedasticity

(equal variance) along X Independence of errors

with respect to X No autocorrelation in time

Analysis of residuals to test assumptions Histogram, Box-and-Whisker plots Normalcy plot Ordered plots (by X, by time,…)

See figures on pp 584-5

t Test for Slope

t b1 1

Sb1

H0: 1 = 0

Sb1

SYX

SSX

SYX SSE

n 2

SSE (Yi

i1

n

ˆ Y i)2

SSX (X i

i1

n

X)2

Critical t value based on chosen level of significance, , and n-2 degrees of freedom

F Test for Single Regression F = MSR / MSE Reject H0 if F > FU(,1,n-2) [or p<]

Note: t2 (,n-2) = FU(,1,n-2)

One-Way ANOVA SummarySource Degrees of

Freedom (df)Sum of Squares (SS)

Mean Square (MS) (Variance)

F p-value

Regression 1 SSR MSR = SSR MSR/MSE

Error n-2 SSE MSE = SSE/(n-2)

Total n-1 SST

Confidence and Prediction Intervals

Confidence Interval Estimate for the Slope

Confidence Interval Estimate for the Mean

Confidence Interval Estimate for Individual Response

b1 tn 2Sb1

b1 tn 2Sb11 b1 tn 2Sb1

ˆ Y i tn 2SYX hi

hi 1

n

(X i X)2

(X i X)2

i1

n

See Fig 12.16, p 592

ˆ Y i tn 2SYX 1 hi

ˆ Y i tn 2SYX hi Y |X i ˆ Y i tn 2SYX hi

ˆ Y i tn 2SYX 1 hi Yi ˆ Y i tn 2SYX 1 hi

Pitfalls Not testing assumptions of least-square regression

by analyzing residuals, looking for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8

Not being aware of alternatives to least-square regression when assumptions are violated

Not knowing subject matter being modeled

Computing by Hand Slope

Y-Intercept

b1 SSXY

SSX

SSXY (X i Xi1

n

)(Yi Y ) X iYi

i1

n

1

nX i

i1

n

Yi

i1

n

SSX (X i Xi1

n

)2 X i2

i1

n

1

nX i

i1

n

2

b0 Y b1 X

X 1

nX i

i1

n

Y 1

nYi

i1

n

Computing by Hand Measures of Variation

SST (Yi Yi1

n

)2 Yi2

i1

n

1

nYi

i1

n

2

SSR ( ˆ Y i Yi1

n

)2 b0 Yi

i1

n

b1 X iYi

i1

n

1

nYi

i1

n

2

SSE (Yi ˆ Y ii1

n

)2 Yi2

i1

n

b0 Yi

i1

n

b1 X iYi

i1

n

Coefficient of Correlation For a regression

For a correlation

r r2 SSR

SST

r SSXY

SSX SSY

SSX (X i

i1

n

X)2

SSY (Yi

i1

n

Y )2

SSXY (X i X)(Yi

i1

n

Y ) Covariance

Also called… Pearson’s product-moment correlation coefficient

t Test for Correlation

t r 1 r2

n 2

H0: = 0

Critical t value based on chosen level of significance, , and n-2 degrees of freedom

)2(1 2

2

nr

rF Compared to FU(,1,n-2) = t2(,n-2)Or

Multiple Regression Linear model - multiple dependent variables

Yi = 0 + 1X1i + … + jXji + i Xji = value of independent variable Yi = observed value of dependent variable 0 = Y-intercept (Y at X=0) j = slope (Y/Xj) i = random error for observation i

Yi’ = b0 + b1Xi + … + bjXji (predicted value) The bj’s are called the regression coefficients ei = Yi - Yi’ (residual)

Minimize ei2 for sample with respect to all bj

Partitioning of Variation Total variation

Regression variation

Random variation

SST (Yi

i1

n

Y )2

Y 1

nYi

i1

n

(Mean response)

SSR ( ˆ Y ii1

n

Y )2

SSE (Yi

i1

n

ˆ Y i)2

SST = SSR + SSE

Coefficient of Multiple DeterminationR2

Y.12..k = SSR/SST

Standard Error of the Estimate

SXY SSE

n k 1

Adjusted R2

To account for sample size (n) and number of dependent variables (k) for comparison purposes

Radj2 1 1 RY .12...k

2 n 1

n k 1

Residual Analysis Plot residuals vs

Yi’ (predicted values)

X1, X2,…,Xk

Time (for autocorrelation) Check for

Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8

F Test for Multiple Regression F = MSR / MSE Reject H0 if F > FU(,k,n-k-1) [or p<]

k = number of independent variables One-Way ANOVA Summary

Source Degrees of Freedom (df)

Sum of Squares (SS)

Mean Square (MS) (Variance)

F p-value

Regression k SSR MSR = SSR/k MSR/MSE

Error n-k-1 SSE MSE = SSE/(n-k-1)

Total n-1 SST

Alternate F-Test

k

kn

R

RF

1

1 2

2

Compared to FU(,k,n-k-1)

t Test for Slope

t b j j

Sb j

H0: j = 0

Sb j

SY .12..k

SSX j

SY .12..k SSE

n k 1

SSE (Yi

i1

n

ˆ Y i)2

SSX j (X ji

i1

n

X j )2

Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom

See output from PHStat

Confidence and Prediction Intervals

Confidence Interval Estimate for the Slope

Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response

Beyond the scope of this text

b j tn k 1Sb j

b j tn k 1Sb j j b j tn k 1Sb j

Partial F Tests Significance test for contribution from individual

independent variable Measure of incremental improvement All others already taken into account

Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j})

Reject H0 if Fj > FU(,1,n-k-1) [or p<] Note: t2 (,n-k-1) = FU(,1,n-k-1)

Coefficients of Partial Determination

RYj.12..k(j )2

SSR(X j |{X ij})

SST SSR SSR(X j |{X ij})

See PHStat output in Fig 13.10, p 637

SSR SSR({X ij})

SST SSR({X ij})

Homework Review “Multiple Regression”, 13.1-5 Work through Appendix 13.1 Work and hand in Problem 13.62 Read “Multiple Regression”, 13.6-11

Quadratic model Dummy-variable model Using transformations Collinearity (VIF) Modeling building

Cp statistic and stepwise regression

Preview problems 13.63-13.67