doane chapter 13 [โหมดความเข้ากันได้] · doane’s...

Chapte13131313

M l i l R iM l i l R iM l i l R iM l i l R i

er13131313Multiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

Multiple RegressionMultiple RegressionMultiple RegressionMultiple RegressionAssessing Overall FitAssessing Overall FitPredictor SignificancePredictor SignificancePredictor SignificancePredictor Significance

Confidence Intervals for Confidence Intervals for YYBinary PredictorsBinary Predictorsyy

Tests for Nonlinearity and InteractionTests for Nonlinearity and InteractionMulticollinearityMulticollinearity

Violations of AssumptionsViolations of AssumptionsOther Regression TopicsOther Regression Topics

Multiple RegressionMultiple RegressionMultiple RegressionMultiple RegressionMultiple RegressionMultiple RegressionMultiple RegressionMultiple Regression

Bivariate or Multivariate?Bivariate or Multivariate?•• Multiple regressionMultiple regression is an extension of bivariate is an extension of bivariate

regression to include more than one independentregression to include more than one independent

Bivariate or Multivariate?Bivariate or Multivariate?

regression to include more than one independent regression to include more than one independent variable.variable.

•• Limitations of bivariate regression:Limitations of bivariate regression:Limitations of bivariate regression:Limitations of bivariate regression:-- often simplisticoften simplistic-- biased estimates if relevant predictors arebiased estimates if relevant predictors areomittedomitted

-- lack of fit does not show that lack of fit does not show that XX is unrelated to is unrelated to YY

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.


Regression TerminologyRegression Terminology•• YY is the is the response variable response variable and is assumed to be and is assumed to be

related to therelated to the kk predictors (predictors (XX11, X, X22, … X, … Xkk) by a) by a

Regression TerminologyRegression Terminology

related to the related to the kk predictors (predictors (XX11, X, X22, … X, … Xkk) by a ) by a linear equation called the linear equation called the population regression population regression modelmodel::

•• The The fitted regression equationfitted regression equation is:is:



Data FormatData Format•• nn observed values of observed values of

the response variable the response variable

Data FormatData Format

ppYY and its proposed and its proposed predictors predictors XX11, X, X22, … X, … Xk k are presented in the are presented in the form of an form of an n n x x kk matrix:matrix:



Illustration: Home PricesIllustration: Home Prices•• Consider the following Consider the following

data of the selling price data of the selling price

Illustration: Home PricesIllustration: Home Prices

g pg pof a home (of a home (YY, the , the response variableresponse variable) and ) and three potential three potential explanatory variables:explanatory variables:XX = SqFt= SqFtXX11 = SqFt= SqFtXX22 = LotSize= LotSizeXX33 = Baths= BathsXX33 Baths Baths



Illustration: Home PricesIllustration: Home Prices•• Intuitively, the regression models areIntuitively, the regression models are

Illustration: Home PricesIllustration: Home Prices



Logic of Variable SelectionLogic of Variable Selection•• State the hypotheses about the sign of the State the hypotheses about the sign of the

coefficients in the model.coefficients in the model.

Logic of Variable SelectionLogic of Variable Selection



Fitted RegressionsFitted Regressions•• Use Excel, MegaStat, MINITAB, or any other Use Excel, MegaStat, MINITAB, or any other

statistical package.statistical package.

Fitted RegressionsFitted Regressions

•• For For nn = = 30 30 home sales, here are the fitted home sales, here are the fitted regressions and their statistics of fit.regressions and their statistics of fit.


•• RR22 is the coefficient of determination and is the coefficient of determination and SESE is the is the standard error of the regression.standard error of the regression.


Common Misconceptions about FitCommon Misconceptions about Fit•• A common mistake is to assume that the model with A common mistake is to assume that the model with

the best fit is preferred.the best fit is preferred.

Common Misconceptions about FitCommon Misconceptions about Fit

pp•• Principle of Occam’s Razor: When two explanations Principle of Occam’s Razor: When two explanations

are otherwise equivalent, we prefer the simpler, are otherwise equivalent, we prefer the simpler, q p pq p pmore parsimonious one.more parsimonious one.



Regression ModelingRegression Modeling•• Four Criteria for Regression AssessmentFour Criteria for Regression AssessmentLogicLogic Is there anIs there an a prioria priori reason to expect areason to expect a

Regression ModelingRegression Modeling

LogicLogic Is there an Is there an a prioria priori reason to expect a reason to expect a causal relationship between the predictorscausal relationship between the predictorsand the response variable?and the response variable?pp

FitFit Does the Does the overalloverall regression show a regression show a significant relationship between thesignificant relationship between thepredictors and the response variable?predictors and the response variable?



Regression ModelingRegression Modeling•• Four Criteria for Regression AssessmentFour Criteria for Regression AssessmentParsimonyParsimony DoesDoes each predictoreach predictor contributecontribute

Regression ModelingRegression Modeling

ParsimonyParsimony Does Does each predictoreach predictor contribute contribute significantly to the explanation? Are somesignificantly to the explanation? Are somepredictors not worth the trouble?predictors not worth the trouble?pp

StabilityStability Are the predictors related to one anotherAre the predictors related to one anotherso strongly that regression estimatesso strongly that regression estimatesbecome erratic?become erratic?


Assessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall FitAssessing Overall Fit

F Test for SignificanceF Test for Significance•• For a regression withFor a regression with k k predictors, the predictors, the

hypotheses to be tested arehypotheses to be tested are

F Test for SignificanceF Test for Significance

hypotheses to be tested arehypotheses to be tested areHH00: All the true coefficients are zero: All the true coefficients are zeroHH11: At least one of the coefficients is nonzero: At least one of the coefficients is nonzero11

In other words,In other words,In other words,In other words,HH00: : ββ11 = = ββ22 = … = = … = ββ44 = = 00HH11: At least one of the coefficients is nonzero: At least one of the coefficients is nonzero



F Test for SignificanceF Test for Significance•• The ANOVA table decomposes variation of the The ANOVA table decomposes variation of the

response variable around its mean intoresponse variable around its mean into


response variable around its mean intoresponse variable around its mean into



F Test for SignificanceF Test for Significance•• The ANOVA calculations for a The ANOVA calculations for a kk--predictor model predictor model

can be summarized ascan be summarized as


can be summarized ascan be summarized as



F Test for SignificanceF Test for Significance•• Here are the ANOVA calculations for the home Here are the ANOVA calculations for the home

price dataprice data


price dataprice data



Coefficient of Determination (RCoefficient of Determination (R22))•• RR22, the coefficient of determination, is a common , the coefficient of determination, is a common

measure of overall fitmeasure of overall fit

Coefficient of Determination (RCoefficient of Determination (R ))

measure of overall fit. measure of overall fit. •• It can be calculated one of two ways. It can be calculated one of two ways. •• For example for the home price dataFor example for the home price data•• For example, for the home price data,For example, for the home price data,



Adjusted RAdjusted R22

•• It is generally possible to raise the coefficient of It is generally possible to raise the coefficient of determination determination RR22 by including addition predictors.by including addition predictors.

Adjusted RAdjusted R

y g py g p•• The The adjusted coefficient of determinationadjusted coefficient of determination is done is done

to penalize the inclusion of useless predictors.to penalize the inclusion of useless predictors.•• For For nn observations and observations and kk predictors,predictors,

•• For the home price data, the adjusted For the home price data, the adjusted RR22 is is p , jp , j



How Many Predictors?How Many Predictors?•• Limit the number of predictors based on the Limit the number of predictors based on the

sample size.sample size.

How Many Predictors?How Many Predictors?

pp•• When When nn//kk is small, the is small, the RR22 no longer gives a no longer gives a

reliable indication of fit.reliable indication of fit.•• Suggested rules are:Suggested rules are:

Evan’s RuleEvan’s Rule (conservative): (conservative): nn//kk >> 0 0 (at least (at least 10 10 ( )( ) ((observations per predictor)observations per predictor)Doane’s RuleDoane’s Rule (relaxed): (relaxed): nn//kk >> 5 5 (at least (at least 5 5 observations per predictor)observations per predictor)


Predictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor Significance

F Test for SignificanceF Test for Significance•• Test each fitted coefficient to see whether it is Test each fitted coefficient to see whether it is

significantly different from zerosignificantly different from zero


significantly different from zero.significantly different from zero.•• The hypothesis tests for predictor The hypothesis tests for predictor XXjj areare

•• If we cannot reject the hypothesis that a coefficient If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not is zero, then the corresponding predictor does not


, p g p, p g pcontribute to the prediction of contribute to the prediction of YY..

Predictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor SignificancePredictor Significance

Test StatisticTest Statistic•• The test statistic for coefficient of predictor The test statistic for coefficient of predictor XXjj is is Test StatisticTest Statistic

•• Find the critical value Find the critical value ttαα for a chosen level of for a chosen level of significance significance αα from Appendix D.from Appendix D.

•• Reject Reject HH00 if if ttjj > > ttαα or if or if pp--value value << αα..

•• TheThe 9595% confidence interval for coefficient% confidence interval for coefficient ββ isis


•• The The 9595% confidence interval for coefficient % confidence interval for coefficient ββjj isis

Confidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YYConfidence Intervals for Confidence Intervals for YY

Standard ErrorStandard Error•• The The standard error of the regression standard error of the regression ((SESE) is ) is

another important measure of fitanother important measure of fit

Standard ErrorStandard Error

another important measure of fit.another important measure of fit.•• For For nn observations and observations and kk predictorspredictors

If all predictions were perfect theIf all predictions were perfect the SESE 00


•• If all predictions were perfect, the If all predictions were perfect, the SESE = = 00..



•• Approximate Approximate 9595% confidence % confidence


interval for conditional mean of interval for conditional mean of YY..

•• The Approximate The Approximate 9595% prediction % prediction ffinterval for individual interval for individual YY valuevalue



Very Quick Prediction Interval for YVery Quick Prediction Interval for Y•• The The tt--values for values for 9595% confidence are typically near % confidence are typically near

22 (as long as(as long as nn is too small)is too small)

Very Quick Prediction Interval for YVery Quick Prediction Interval for Y

2 2 (as long as (as long as nn is too small).is too small).•• A very quick prediction interval without using a A very quick prediction interval without using a

tt table is:table is:tt table is:table is:

•• Approximate Approximate 9595% confidence % confidence interval for conditional mean ofinterval for conditional mean of YYinterval for conditional mean of interval for conditional mean of YY..

•• The Approximate The Approximate 9595% prediction % prediction interval for individualinterval for individual YY valuevalue


interval for individual interval for individual YY valuevalue

Binary PredictorsBinary PredictorsBinary PredictorsBinary PredictorsBinary PredictorsBinary PredictorsBinary PredictorsBinary Predictors

What Is a Binary Predictor?What Is a Binary Predictor?•• A binary predictor has two values (usually A binary predictor has two values (usually 0 0 and and 11))

to denote the presence or absence of a conditionto denote the presence or absence of a condition

What Is a Binary Predictor?What Is a Binary Predictor?

to denote the presence or absence of a condition.to denote the presence or absence of a condition.•• For example, for For example, for n n graduates from an MBA program: graduates from an MBA program:

Employed =Employed = 11Employed Employed 11Unemployed = Unemployed = 00

•• These variables are also calledThese variables are also called dummydummy oror indicatorindicatorThese variables are also called These variables are also called dummy dummy or or indicator indicator variables.variables.

•• For easy understandability, name the binary variable For easy understandability, name the binary variable y y, yy y, ythe characteristic that is equivalent to the value of the characteristic that is equivalent to the value of 11..



Effects of a Binary PredictorEffects of a Binary Predictor•• A binary predictor is sometimes called a A binary predictor is sometimes called a shift shift

variablevariable because it shifts the regression plane up orbecause it shifts the regression plane up or

Effects of a Binary PredictorEffects of a Binary Predictor

variablevariable because it shifts the regression plane up or because it shifts the regression plane up or down.down.

•• SupposeSuppose XX11 is a binary predictor which can take onis a binary predictor which can take onSuppose Suppose XX11 is a binary predictor which can take on is a binary predictor which can take on only the values of only the values of 0 0 or or 11..

•• Its contribution to the regression is eitherIts contribution to the regression is either bb11 ororIts contribution to the regression is either Its contribution to the regression is either bb11 or or nothing, resulting in an intercept of either nothing, resulting in an intercept of either bb00 (when (when XX11 = = 00) or ) or bb00 + + bb11 (when (when XX11 = = 11). ).




•• The slope does The slope does t h lt h l


not change, only not change, only the intercept is the intercept is shifted Forshifted Forshifted. Forshifted. Forexample,example,



Testing a Binary for SignificanceTesting a Binary for Significance•• In multiple regression, binary predictors require In multiple regression, binary predictors require

no special treatment They are tested as anyno special treatment They are tested as any

Testing a Binary for SignificanceTesting a Binary for Significance

no special treatment. They are tested as any no special treatment. They are tested as any other predictor using a other predictor using a tt test.test.



More Than One BinaryMore Than One Binary•• More than one binary occurs when the number of More than one binary occurs when the number of

categories to be coded exceeds twocategories to be coded exceeds two

More Than One BinaryMore Than One Binary

categories to be coded exceeds two. categories to be coded exceeds two. •• For example, for the variable For example, for the variable GPA by class levelGPA by class level, ,

each category is a binary variable:each category is a binary variable:each category is a binary variable:each category is a binary variable:FreshmanFreshman = = 1 1 if a freshman, if a freshman, 0 0 otherwiseotherwiseSophomoreSophomore = = 1 1 if a sophomore, if a sophomore, 0 0 otherwiseotherwiseJuniorJunior = = 1 1 if a junior, if a junior, 0 0 otherwiseotherwiseSeniorSenior = = 1 1 if a senior, if a senior, 0 0 otherwiseotherwiseMastersMasters == 11 if a master’s candidateif a master’s candidate 00 otherwiseotherwise


MastersMasters = = 1 1 if a master s candidate, if a master s candidate, 0 0 otherwiseotherwiseDoctoralDoctoral = = 1 1 if a PhD candidate, if a PhD candidate, 0 0 otherwiseotherwise


More Than One BinaryMore Than One Binary•• If there are If there are cc mutually exclusive and collectively mutually exclusive and collectively

exhaustive categories, then there are onlyexhaustive categories, then there are only cc--11

More Than One BinaryMore Than One Binary

exhaustive categories, then there are only exhaustive categories, then there are only cc 1 1 binaries to code each observation.binaries to code each observation.

Any one of the Any one of the categories can be categories can be omitted because the omitted because the

i ii i 11remaining remaining cc--1 1 binary values uniquely binary values uniquely determine thedetermine the


determine the determine the remaining binary. remaining binary.


What if I Forget to Exclude One Binary?What if I Forget to Exclude One Binary?•• Including all Including all cc binaries for binaries for cc categories would categories would

introduce a serious problem for the regressionintroduce a serious problem for the regression

What if I Forget to Exclude One Binary?What if I Forget to Exclude One Binary?

introduce a serious problem for the regression introduce a serious problem for the regression estimation.estimation.

•• One column in the One column in the XX data matrix will be a perfect data matrix will be a perfect pplinear combination of the other column(s).linear combination of the other column(s).

•• The least squares estimation would fail because The least squares estimation would fail because the data matrix would be singular (i.e., would the data matrix would be singular (i.e., would have no inverse).have no inverse).



Regional BinariesRegional Binaries•• Binaries are commonly used to code regions. Binaries are commonly used to code regions.

For example,For example,

Regional BinariesRegional Binaries

For example, For example, MidwestMidwest = = 1 1 if in the Midwest, if in the Midwest, 0 0 otherwiseotherwiseNeastNeast = = 1 1 if in the Northeast, if in the Northeast, 0 0 otherwiseotherwiseSeastSeast = = 1 1 if in the Southeast, if in the Southeast, 0 0 otherwiseotherwiseWestWest = = 1 1 if in the West, if in the West, 0 0 otherwiseotherwise


Tests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and Interaction

Tests for NonlinearityTests for Nonlinearity•• Sometimes the effect of a predictor is nonlinear.Sometimes the effect of a predictor is nonlinear.•• To test for nonlinearity of any predictor include itsTo test for nonlinearity of any predictor include its

Tests for NonlinearityTests for Nonlinearity

•• To test for nonlinearity of any predictor, include its To test for nonlinearity of any predictor, include its square in the regression. For example,square in the regression. For example,

•• If the linear model is the correct one, the If the linear model is the correct one, the coefficients of the squared predictors coefficients of the squared predictors ββ22 and and ββ44q pq p ββ22 ββ44would not differ significantly from zero.would not differ significantly from zero.

•• Otherwise a quadratic relationship would exist Otherwise a quadratic relationship would exist


q pq pbetween between YY and the respective predictor variable.and the respective predictor variable.

Tests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and InteractionTests for Nonlinearity and Interaction

Tests for InteractionTests for Interaction•• Test for Test for interactioninteraction between two predictors by between two predictors by

including their product in the regression.including their product in the regression.

Tests for InteractionTests for Interaction

including their product in the regression.including their product in the regression.

•• If we reject the hypothesis If we reject the hypothesis HH00: : ββ33 = = 00, then we , then we conclude that there is a significant interaction conclude that there is a significant interaction ggbetween between XX11 and and XX22. .

•• Interaction effects require careful interpretation Interaction effects require careful interpretation


q pq pand cost and cost 1 1 degree of freedom per interaction.degree of freedom per interaction.

MulticollinearityMulticollinearityMulticollinearityMulticollinearityMulticollinearityMulticollinearityMulticollinearityMulticollinearity

What is Multicollinearity?What is Multicollinearity?•• MulticollinearityMulticollinearity occurs when the independent occurs when the independent

variablesvariables XX11,, XX22, …,, …, XXmm are intercorrelatedare intercorrelated

What is Multicollinearity?What is Multicollinearity?

variables variables XX11, , XX22, …, , …, XXmm are intercorrelated are intercorrelated instead of being independent.instead of being independent.

•• CollinearityCollinearity occurs if only two predictors are occurs if only two predictors are yy y py pcorrelated.correlated.

•• The The degreedegree of multicollinearity is the real of multicollinearity is the real concern.concern.



Variance InflationVariance Inflation•• Multicollinearity inducesMulticollinearity induces variance inflation variance inflation when when

predictors are strongly intercorrelated.predictors are strongly intercorrelated.

Variance InflationVariance Inflation

predictors are strongly intercorrelated.predictors are strongly intercorrelated.•• This results in wider confidence intervals for the This results in wider confidence intervals for the

true coefficients true coefficients ββ11, , ββ22, …, , …, ββmm and makes the and makes the ββ11,, ββ22, ,, , ββmmtt statistic less reliable.statistic less reliable.

•• The separate contribution of each predictor in The separate contribution of each predictor in “explaining” the response variable is difficult to “explaining” the response variable is difficult to identify.identify.



Correlation MatrixCorrelation Matrix•• To check whether two predictors are correlated To check whether two predictors are correlated

((collinearitycollinearity), inspect the), inspect the correlation matrixcorrelation matrix usingusing

Correlation MatrixCorrelation Matrix

((collinearitycollinearity), inspect the ), inspect the correlation matrixcorrelation matrix using using Excel, MegaStat, or MINITAB. For example,Excel, MegaStat, or MINITAB. For example,



Correlation MatrixCorrelation MatrixCorrelation MatrixCorrelation Matrix•• A quick Rule:A quick Rule:

A sample correlation whose absolute valueA sample correlation whose absolute valueA sample correlation whose absolute value A sample correlation whose absolute value exceeds exceeds 22/ n probably differs significantly from / n probably differs significantly from zero in a twozero in a two--tailed test at tailed test at αα = .= .0505. .

•• This applies to samples that are not too small This applies to samples that are not too small (say, (say, 20 20 or more).or more).



Predictor Matrix PlotsPredictor Matrix PlotsPredictor Matrix PlotsPredictor Matrix PlotsThe collinearity The collinearity for the squaredfor the squaredfor the squared for the squared predictors can predictors can often be seen in often be seen in scatter plots. scatter plots. For example, For example,



Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)•• The matrix scatter plots and correlation matrix The matrix scatter plots and correlation matrix

only show correlations between anyonly show correlations between any twotwoonly show correlations between any only show correlations between any twotwopredictors. predictors.

•• The The variance inflation factorvariance inflation factor ((VIFVIF) is a more ) is a more (( ))comprehensive test for multicollinearity.comprehensive test for multicollinearity.

•• For a given predictor For a given predictor jj, the , the VIFVIF is defined asis defined aswhere where RRjj

22 is the coefficient of is the coefficient of determination when predictor determination when predictor jj is is


regressed against all other predictors.regressed against all other predictors.


Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)Variance Inflation Factor (VIF)•• Some possible situations are:Some possible situations are:



Rules of ThumbRules of ThumbRules of ThumbRules of Thumb•• There is no limit on the magnitude of the There is no limit on the magnitude of the VIFVIF..•• AA VIFVIF ofof 1010 says that the other predictorssays that the other predictors•• A A VIFVIF of of 10 10 says that the other predictors says that the other predictors

“explain” “explain” 9090% of the variation in predictor % of the variation in predictor jj..•• This indicates that predictorThis indicates that predictor jj is strongly related tois strongly related toThis indicates that predictor This indicates that predictor jj is strongly related to is strongly related to

the other predictors.the other predictors.•• However, it is not necessarily indicative ofHowever, it is not necessarily indicative ofHowever, it is not necessarily indicative of However, it is not necessarily indicative of

instability in the least squares estimate.instability in the least squares estimate.•• A large A large VIFVIF is a warning to consider whether is a warning to consider whether


gg ggpredictor predictor jj really belongs to the model.really belongs to the model.


Are Coefficients Stable?Are Coefficients Stable?Are Coefficients Stable?Are Coefficients Stable?•• Evidence of instability isEvidence of instability is

whenwhen XX andand XX have a high pairwise correlationhave a high pairwise correlationwhen when XX11 and and XX22 have a high pairwise correlation have a high pairwise correlation with with YY, yet one or both predictors have , yet one or both predictors have insignificant insignificant tt statistics in the fitted multiple statistics in the fitted multiple gg ppregression, and/orregression, and/orif if XX11 and and XX22 are positively correlated with are positively correlated with YY, yet , yet 11 22one has a negative slope in the multiple one has a negative slope in the multiple regression.regression.



Are Coefficients Stable?Are Coefficients Stable?Are Coefficients Stable?Are Coefficients Stable?•• As a test, try dropping a collinear predictor from As a test, try dropping a collinear predictor from

the regression and seeing what happens to thethe regression and seeing what happens to thethe regression and seeing what happens to the the regression and seeing what happens to the fitted coefficients in the refitted coefficients in the re--estimated model.estimated model.

•• If they don’t change much, then multicollinearity If they don’t change much, then multicollinearity y g , yy g , yis not a concern.is not a concern.

•• If it causes sharp changes in one or more of the If it causes sharp changes in one or more of the remaining coefficients in the model, then the remaining coefficients in the model, then the multicollinearity may be causing instability.multicollinearity may be causing instability.


Violations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of AssumptionsViolations of Assumptions

•• The least squares method makes severalThe least squares method makes severalThe least squares method makes several The least squares method makes several assumptions about the (unobservable) random assumptions about the (unobservable) random errors errors εεii. Clues about these errors may be found . Clues about these errors may be found iiin the residuals in the residuals eeii..

•• Assumption Assumption 11: The errors are normally : The errors are normally distributed.distributed.

•• Assumption Assumption 22: The errors have constant : The errors have constant i (i th h d ti )i (i th h d ti )variance (i.e., they are homoscedastic).variance (i.e., they are homoscedastic).

•• Assumption Assumption 33: The errors are independent (i.e., : The errors are independent (i.e., th t l t d)th t l t d)


they are nonautocorrelated).they are nonautocorrelated).


NonNon--Normal ErrorsNormal ErrorsNonNon Normal ErrorsNormal Errors•• Except when there are major outliers, nonExcept when there are major outliers, non--normal normal

residuals are usually considered a mild violation.residuals are usually considered a mild violation.residuals are usually considered a mild violation.residuals are usually considered a mild violation.•• Regression coefficients and variance remain Regression coefficients and variance remain

unbiased and consistent. unbiased and consistent. •• Confidence intervals for the parameters may be Confidence intervals for the parameters may be

unreliable since they are based on the normality unreliable since they are based on the normality assumption.assumption.

•• The confidence intervals are generally OK with a The confidence intervals are generally OK with a


large sample size (e.g., large sample size (e.g., nn > > 3030) and no outliers.) and no outliers.


NonNon--Normal ErrorsNormal ErrorsNonNon Normal ErrorsNormal Errors•• TestTest

HH00: Errors are normally distributed: Errors are normally distributedHH00: Errors are normally distributed: Errors are normally distributedHH11: Errors are not normally distributed: Errors are not normally distributed

•• Create a Create a histogram of residualshistogram of residuals (plain or (plain or gg (p(pstandardized) to visually reveal any outliers or standardized) to visually reveal any outliers or serious asymmetry.serious asymmetry.

•• The normal The normal probability plotprobability plot will also visually test will also visually test for normality.for normality.



Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)•• If the error variance is constant, the errors are If the error variance is constant, the errors are

homoscedastichomoscedastic. If the error variance is. If the error variance ishomoscedastichomoscedastic. If the error variance is . If the error variance is nonconstant, the errors are nonconstant, the errors are heteroscedasticheteroscedastic..

•• This violation is potentially serious.This violation is potentially serious.p yp y•• The least squares regression parameter The least squares regression parameter

estimates are unbiased and consistent. estimates are unbiased and consistent. •• Estimated variances are biased (understated) Estimated variances are biased (understated)

and not efficient, resulting in overstated and not efficient, resulting in overstated tt


statistics and narrow confidence intervals.statistics and narrow confidence intervals.


Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)•• The hypotheses are:The hypotheses are:

HH00: Errors have constant variance: Errors have constant varianceHH00: Errors have constant variance : Errors have constant variance (homoscedastic)(homoscedastic)

HH11: Errors have nonconstant variance: Errors have nonconstant variance(heteroscedastic)(heteroscedastic)

•• Constant variance can be visually tested by Constant variance can be visually tested by examining scatter plots of the residuals against examining scatter plots of the residuals against each predictor.each predictor.Id ll th ill b ttId ll th ill b tt


•• Ideally there will be no pattern.Ideally there will be no pattern.


Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)Nonconstant Variance (Heteroscedasticity)



AutocorrelationAutocorrelationAutocorrelationAutocorrelation•• AutocorrelationAutocorrelation is a pattern of nonindependent is a pattern of nonindependent

errors that violates the assumption that eacherrors that violates the assumption that eacherrors that violates the assumption that each errors that violates the assumption that each error is independent of its predecessor.error is independent of its predecessor.

•• This is a problem with time series data.This is a problem with time series data.pp•• Autocorrelated errors results in biased estimated Autocorrelated errors results in biased estimated

variances which will result in narrow confidence variances which will result in narrow confidence intervals and large intervals and large tt statistics.statistics.

•• The model’s fit may be overstated.The model’s fit may be overstated.



AutocorrelationAutocorrelationAutocorrelationAutocorrelation•• Test the hypotheses:Test the hypotheses:

HH00: Errors are nonautocorrelated: Errors are nonautocorrelatedHH00: Errors are nonautocorrelated: Errors are nonautocorrelatedHH11: Errors are autocorrelated: Errors are autocorrelated

•• We will use the observable residuals We will use the observable residuals ee11, , ee22, …, , …, eenn11,, 22, ,, , nnfor evidence of autocorrelation and the Durbinfor evidence of autocorrelation and the Durbin--Watson test statistic Watson test statistic DWDW::



AutocorrelationAutocorrelationAutocorrelationAutocorrelation•• The The DWDW statistic lies between statistic lies between 0 0 and and 44..•• WhenWhen HH is true (no autocorrelation) theis true (no autocorrelation) the DWDW•• When When HH00 is true (no autocorrelation), the is true (no autocorrelation), the DWDW

statistic will be near statistic will be near 22..•• AA DWDW << 22 suggestssuggests positivepositive autocorrelationautocorrelationA A DWDW < < 2 2 suggests suggests positivepositive autocorrelation.autocorrelation.•• A A DW DW > > 2 2 suggests suggests negativenegative autocorrelation.autocorrelation.•• Ignore theIgnore the DWDW statistic for crossstatistic for cross--sectional datasectional data•• Ignore the Ignore the DWDW statistic for crossstatistic for cross--sectional data.sectional data.



Unusual ObservationsUnusual ObservationsUnusual ObservationsUnusual Observations•• An observation may be unusual An observation may be unusual

11 because the fitted model’s prediction is poorbecause the fitted model’s prediction is poor11. because the fitted model s prediction is poor. because the fitted model s prediction is poor((unusual residualsunusual residuals), or), or

22 because one or more predictors may bebecause one or more predictors may be22. because one or more predictors may be. because one or more predictors may behaving a large influence on the regressionhaving a large influence on the regressionestimates (estimates (unusual leverageunusual leverage).).



Unusual ObservationsUnusual ObservationsUnusual ObservationsUnusual Observations•• To check for To check for unusual residualsunusual residuals, simply inspect the , simply inspect the

residuals to find instances where the model doesresiduals to find instances where the model doesresiduals to find instances where the model does residuals to find instances where the model does not predict well.not predict well.

•• To check for To check for unusual leverageunusual leverage, look at the , look at the gg ,,leverage statisticleverage statistic (how far each observation is (how far each observation is from the mean(s) of the predictors) for each from the mean(s) of the predictors) for each observation.observation.

•• For For nn observations and observations and kk predictors, look for predictors, look for b ti h l db ti h l d 22((kk 11)/)/


observations whose leverage exceeds observations whose leverage exceeds 22((kk + + 11)/)/n.n.

Other Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression TopicsOther Regression Topics

Outliers: Causes and CuresOutliers: Causes and Cures•• An outlier may be due to an error in recording the An outlier may be due to an error in recording the

data and if so, the observation should be deleted.data and if so, the observation should be deleted.

Outliers: Causes and CuresOutliers: Causes and Cures

data and if so, the observation should be deleted.data and if so, the observation should be deleted.•• It is reasonable to discard an observation on the It is reasonable to discard an observation on the

grounds that it represents a different population grounds that it represents a different population g p p pg p p pthat the other observations.that the other observations.



Missing PredictorsMissing Predictors•• An outlier may also be an observation that has An outlier may also be an observation that has

been influenced by an unspecified “lurking”been influenced by an unspecified “lurking”

Missing PredictorsMissing Predictors

been influenced by an unspecified lurking been influenced by an unspecified lurking variable that should have been controlled but variable that should have been controlled but wasn’t.wasn’t.

•• Try to identify the lurking variable and formulate a Try to identify the lurking variable and formulate a multiple regression model including both multiple regression model including both predictors.predictors.

•• Unspecified “lurking” variables cause inaccurate Unspecified “lurking” variables cause inaccurate di ti f th fitt d idi ti f th fitt d i


predictions from the fitted regression.predictions from the fitted regression.


IllIll--Conditioned DataConditioned Data•• All variables in the regression should be of the All variables in the regression should be of the

same general order of magnitude.same general order of magnitude.

IllIll Conditioned DataConditioned Data

same general order of magnitude.same general order of magnitude.•• Do not mix very large data values with very small Do not mix very large data values with very small

data values.data values.•• To avoid mixing magnitudes, adjust the decimal To avoid mixing magnitudes, adjust the decimal

point in both variables.point in both variables.•• Be consistent throughout the data column.Be consistent throughout the data column.•• The decimal adjustments for each data column The decimal adjustments for each data column


jjneed not be the same.need not be the same.


Significance in Large SamplesSignificance in Large Samples•• Statistical significanceStatistical significance may not imply may not imply practical practical

importanceimportance..

Significance in Large SamplesSignificance in Large Samples

importanceimportance..•• Anything can be made significant if you get a Anything can be made significant if you get a

large enough sample.large enough sample.g g pg g p



Model Specification ErrorsModel Specification Errors•• A A misspecified modelmisspecified model occurs when you estimate occurs when you estimate

a linear model when actually a nonlinear model isa linear model when actually a nonlinear model is

Model Specification ErrorsModel Specification Errors

a linear model when actually a nonlinear model is a linear model when actually a nonlinear model is required or when a relevant predictor is omitted.required or when a relevant predictor is omitted.

•• To detect misspecificationTo detect misspecificationpp-- Plot the residuals against estimated Plot the residuals against estimated YY(should be no discernable pattern).(should be no discernable pattern).

-- Plot the residuals against actual Plot the residuals against actual YY(should be no discernable pattern).(should be no discernable pattern).Plot the fittedPlot the fitted YY against the actualagainst the actual YY


-- Plot the fitted Plot the fitted YY against the actual against the actual YY(should be a (should be a 4545°° line).line).


Missing DataMissing Data•• Discard a variable if many data values are Discard a variable if many data values are

missing.missing.

Missing DataMissing Data

missing.missing.•• If a If a YY value is missing, discard the observation to value is missing, discard the observation to

be conservative.be conservative.•• Other options would be to use the mean of the Other options would be to use the mean of the XX

data column for the missing values or to use a data column for the missing values or to use a regression procedure to “fit” the missing regression procedure to “fit” the missing XX--value value from the complete observations.from the complete observations.



Binary Dependent VariableBinary Dependent Variable•• When the response variable When the response variable YY is binary (is binary (00, , 11), the ), the

least squares estimation method is no longerleast squares estimation method is no longer

Binary Dependent VariableBinary Dependent Variable

least squares estimation method is no longer least squares estimation method is no longer appropriate.appropriate.

•• Use logit and probit regression methods.Use logit and probit regression methods.g p gg p g



Stepwise and Best Subsets RegressionStepwise and Best Subsets Regression•• The The stepwise regression stepwise regression procedure finds the best procedure finds the best

fitting model usingfitting model using 11,, 22,, 33, …,, …, kk predictors.predictors.

Stepwise and Best Subsets RegressionStepwise and Best Subsets Regression

fitting model using fitting model using 11, , 22, , 33, …, , …, kk predictors.predictors.•• This procedure is appropriate only when there is This procedure is appropriate only when there is

no theoretical model that specifies which no theoretical model that specifies which pppredictors predictors shouldshould be used. be used.

•• Perform Perform best subsetsbest subsets regression using all regression using all possible combinations of predictors.possible combinations of predictors.


Applied Statistics in Applied Statistics in Business and Economics

End of Chapter End of Chapter 1313

doane chapter 13 [โหมดความเข้ากันได้] · doane’s...

Documents