multiple regression presented by muhammad danish mbaeve-iv

34
Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Upload: linda-joseph

Post on 17-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Multiple RegressionPresented By

Muhammad DanishMBAEVE-IV

Page 2: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Multiple regression, in which there are two or more independent variables on the right side of the equation.

Page 3: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Simple Regression Multiple Regression

True Relation

Estimated Relation

Yi = + Xi +i

Yi = a + bXi +ei

Yi = + 1X1i + 2X2i +…+ kXki + i

Yi = a + b1X1i + b2X2i +…+ bkXki + ei

The number of X’s (independent variables) will be denoted as k. We are estimating k+1 parameters, the k ’s and the constant .

Page 4: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

We have similar assumptions to the ones we used in simple regression. The assumptions are

• The Y values are independent of each other.

• The conditional distributions of Y given the X’s are normal.

• The conditional standard deviations of Y given the X’s are equal for all values of the X’s.

Page 5: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

We continue to use OLS (ordinary least squares).

It is much more difficult to do multiple regression with a hand calculator than simple regression is, but computer programs perform it very easily and quickly.

Page 6: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

As in simple regression, we have

. 2i )Y - (Y SST

.)Y - Y( SSR 2i ˆ

.)Y - (Y SSE 2ii ˆ

Page 7: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

The standard error of the regression or the standard error of the estimate is

1kn

SSESSER e

In simple regression, there was only one X, so k was 1 and our denominator was (n – 2) . Here the denominator is generalized to (n – k – 1).

Page 8: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

The Regression ANOVA Table is now:

Source of Variation

Sum of squaresDegrees of

freedomMean square

Regression kMSRSSR/k

Error n – k – 1 MSE

SSE/(n – k – 1)

Total n – 1 MST

SST/(n – 1) 2

i )Y - (Y SST

2i )Y - Y( SSR ˆ

2ii )Y - (Y SSE ˆ

Page 9: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

The hypotheses for testing the overall significance of the regression are:

H0: 1 = 2 = … = k = 0 (all the slope coefficients are zero)

H1: at least one of the ’s is not zero.

The statistic for the test is

)( 1knSSE

kSSR

MSE

MSRF 1-k-nk,

Page 10: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

We can also test whether a particular coefficient j is zero (or any other specified value), using a t-statistic.

jb

jj1kn s

- b t

The calculation of sbj is very messy, but sbj is always given on computer output.

We can do one-tailed and two-tailed tests.

Page 11: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

SST

SSE - 1

SST

SSR R2

Coefficient of determination or R2:

YY

YY

2i

2i

)(

)ˆ(

1)-(nSST

1)-k-(nSSE - 1 R2

c

R2 adjusted or corrected for degrees of freedom:

1kn

1nR11R or 22

c )(

Page 12: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Dummy Variables

Dummy variables enable us to explore the effects of qualitative rather than quantitative factors.

Side note: Cross-sectional data provides us with information on a number of households, individuals, firms, etc. at a particular point in time. Time-series data gives us information on a particular household, firm, etc. at various points in time.

Suppose, for example, we have cross-sectional data on income. Dummy variables can give us an understanding of how race, gender, residence in an urban area can affect income.

If we have time-series data on expenditures, dummy variables can tell us about seasonal effects.

Page 13: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

To capture the effects of a factor that has m categories, you need m – 1 dummy variables. Here are some examples.

Gender: You are examining SAT scores. Since there are 2 gender categories, you need 1 gender variable to capture the effect of gender. If you include a variable that is 1 for male observations and 0 for females, the coefficient on that variable tells how male scores compare to female scores. In this case, female is the reference category.Race: You are examining salaries and you have data for 4 races: white, black, Asian, and Native American. You only need 3 dummy variables. You might define a variable that is 1 for blacks and 0 for non-blacks, a 2nd variable that is 1 for Asians and 0 for non-Asians, and a 3rd variable that is 1 for Native Americans and 0 for non-Native Americans. Then white would be the reference category and the coefficients of the 3 race variables would tell how salaries for those groups compare to salaries for whites.

Page 14: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Coefficient interpretation example: You have estimated the regression

If there are two people with the same experience and gender, and one has 1 more unit of education (in this case, a year), that person would be expected to have a salary that is 1.0 units higher (in this case, 1.0 thousand dollars higher).

If there are two people with the same education and gender, and one has 1 more year of experience, that person would be expected to have a salary that is 2.0 thousand dollars higher.

If there are two people with the same education and experience, and one is male and one is female, the female is expected to have a salary that is 5.0 thousand dollars less.

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY where SALARY is measured in thousands of dollars, EDUC and EXP are education and experience, each measured in years, and FEMALE is a dummy variable equal to 1 for females and 0 for males. The coefficients of the variables would be interpreted as follows.

Page 15: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY

education experience female salary

10 5 0

11 5 0

11 6 0

11 6 1

Consider 4 people with the following characteristics.

Page 16: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY

education experience female salary

10 5 0 10 + 10 + 10 – 0 = 30

11 5 0

11 6 0

11 6 1

Consider 4 people with the following characteristics.

Page 17: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY

education experience female salary

10 5 0 10 + 10 + 10 – 0 = 30

11 5 0 10 + 11 + 10 – 0 = 31

11 6 0

11 6 1

Consider 4 people with the following characteristics.

If two people have the same experience and gender, the one that has one more year of education, would be expected to earn 1.0 thousand dollars more.

Page 18: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY

education experience female salary

10 5 0 10 + 10 + 10 – 0 = 30

11 5 0 10 + 11 + 10 – 0 = 31

11 6 0 10 +11 + 12 – 0 = 33

11 6 1

Consider 4 people with the following characteristics.

If two people have the same education and gender, the one that has one more year of experience, would be expected to earn 2.0 thousand dollars more.

Page 19: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

(FEMALE) 5.0 - (EXP) 2.0 (EDUC) 1.0 10.0 SALARY

education experience female salary

10 5 0 10 + 10 + 10 – 0 = 30

11 5 0 10 + 11 + 10 – 0 = 31

11 6 0 10 +11 + 12 – 0 = 33

11 6 1 10 + 11 +12 – 5 = 28

Consider 4 people with the following characteristics.

If two people have the same education and experience, the female would be expected to earn 5.0 thousand dollars less than the male.

Page 20: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Suppose you have regression results based on quarterly data for a particular household.

(SUMMER) 1.0 - (SPRING) 2.0 (WINTER) 3.0 (INCOME) 0.70 10.0 SPENDING

SPENDING and INCOME are in thousands of dollars. WINTER equals 1 if the quarter is winter and 0 if it is fall, spring or summer. SPRING is 1 if the quarter is spring and 0 if it is fall, winter or summer. SUMMER is 1 if the quarter is summer and 0 if it is fall, spring or winter. Suppose, household income is 10 thousand dollars for all 4 quarters of a particular year. In the fall, spending would be expected to be 17 thousand dollars. In the spring, spending would be expected to be 2.0 thousand dollars higher than in fall or 19 thousand dollars.In the winter, spending would be expected to be 3.0 thousand dollars higher than in the fall or 20 thousand dollars.In the summer, spending would be expected to be 1.0 thousand dollars less than in the fall or 16 thousand dollars.

Page 21: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Example: You have run a regression with 30 observations. The dependent variable, WGT, is weight measured in pounds. The independent variables are HGT, height measured in inches and a dummy variable, MALE, which is 1 if the person is male and 0 if the person is female. The results are as shown below. Answer the questions that follow.

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1171.99

Page 22: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

1. Interpret the HGT coefficient.

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1171.99

If there are 2 people of the same gender and one is an inch taller than the other, the taller one is expected to weigh 4.378 pounds more.

Page 23: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

2. Interpret the MALE coefficient.

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1171.99

If there are 2 people of the same height, and one is male and one is female, the male is expected to weigh 27.478 pounds more.

Page 24: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

3. Calculate and interpret the coefficient of determination R2. Also calculate the adjusted R2.

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1,171.99

About 75% of the variation in weight is explained by the regression on height and gender.

7477.033,987.81

25,414.01

SST

SSR 2 R

729.02981.987,33

2780.573,8 - 1

1)-(nSST

1)-k-(nSSE - 1 R 2

c

Page 25: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

4. Test at the 5% level whether the HGT coefficient is greater than zero. (Note that is the alternative hypothesis.)

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1,171.99

97.31.103

0 - 4.378

s

- b t

jb

j27 j

From our t table, we see that for 27 dof, and a 1-tailed 5% critical region, our critical value is 1.703. Since the value of our statistic is 3.97, we reject H0 and accept H1: the HGT coefficient is greater than zero.

.05

0 1.703 t27

critical region

Page 26: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

5. Test at the 1% level whether the MALE coefficient is different from zero. (Note that is the alternative hypothesis.)

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1,171.99

89.29.52

0 - 27.478

s

- b t

jb

j27 j

From our t table, we see that for 27 dof, and a 2-tailed 1% critical region, our critical values are 2.771 and -2.771. Since the value of our statistic is 2.89, we reject H0 and accept H1: the MALE coefficient is different from zero.

.005 .005

-2.771 0 2.771 t27

critical region

critical region

Page 27: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

6. Test the overall significance of the regression at the 1% level.

variableestimated coefficient

estimated std. error

CONSTANT -160.129 50.285

HGT 4.378 1.103

MALE 27.478 9.520

source of variation

sum of squares

degrees of freedom

mean square

regression 25,414.01 2 12,707.01

error 8,573.80 27 317.48

total 33,987.81 29 1,171.99

From our F table, we see that for 2 and 27 dof, and a 1% critical region, our critical value is 5.49. Since the value of our statistic is 40.02, we reject H0 and accept H1: at least one of the slope coefficients is not zero.

02.4048.317

01.707,12

MSE

MSRF 27 2,

f(F2,27)

F2,27

acceptance region crit. reg.

0.01

5.49

Page 28: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Multicollinearity Problem

Multicollinearity arises when independent variables X’s are highly correlated.

Then it is not possible to separate the effects of the these variables on the dependent variable Y.

The slope coefficient estimates will tend to be unreliable, and often are not significantly different from zero.

The simplest solution is to delete one of the correlated variables.

Page 29: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Example: You are exploring the factors influencing the number of children that a couple has.

You have included as X’s the mother’s education and the father’s education.

You find that neither appears to be statistically significantly different from zero.

This may occur because the two education variables are highly correlated.

One option is to include only the education of one parent.

Alternatively, you could use replace the two education variables with just one variable that might be either the average or total education of the parents.

Page 30: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Problem of Autocorrelation or Serial Correlation

This is a problem that may arise in time-series data, but generally not in cross-sectional data.

It occurs when successive observations of the dependent variable Y are not independent of each other.

For example, if you are examining the weight of a particular person over time, if that that weight is particularly high in one period, it is likely to be high in the next period as well.

well.as 8 periodin zeroan greater th be willresidual likely the

isit example,for 7, periodin , 0 Y - Y e residual theif So, iii

Therefore, the residuals tend to be correlated among themselves (auto correlated) rather than independent.

Page 31: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

You can test for autocorrelation using the Durbin-Watson statistic

n

1i

2i

n

2i

21-ii

e

)e - (e d

The Durbin-Watson statistic d is always between 0 and 4.

When there is extreme negative autocorrelation, d will be near 4.

When there is extreme positive autocorrelation, d will be near 0.

When there is no problem of autocorrelation, d will be near 2.

In many computer statistical packages you can request that the Durbin–Watson be provided as output.

You can look up critical values in a table that then allows you to determine if you have an autocorrelation problem.

Page 32: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

The Durbin-Watson table provides two numbers dL and dU corresponding to the number n of observations and the number k of explanatory variables (X’s).Your textbook provides one-tailed values, so you can test for “positive autocorrelation” or “negative autocorrelation” but not “positive or negative autocorrelation” at the same time. The null hypothesis is that there is no autocorrelation.

The diagram below indicates which regions are indicative of positive autocorrelation, negative autocorrelation, no autocorrelation, or are inconclusive.

2 40 dL dU 4 – dU 4 – dL

positive autocorrelation

no autocorrelation problem inconclusive

negative autocorrelationinconclusive

Page 33: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

Example: You have run a time-series regression with 25 observations and 4 independent variables. Your Durbin-Watson statistic d = 0.70 . Test at the 1% level whether you have a positive autocorrelation problem.

The Durbin-Watson table indicates that for 25 observations and 4 independent variables, dL = 0.83 and dU = 1.52 . This implies the following diagram.

2 40 dL dU 4 – dU 4 – dL

positive autocorrelation

no autocorrelation problem inconclusive

negative autocorrelationinconclusive

0.83 1.52 2.48 3.17

You reject H0: no autocorrelation and accept H1: there is a positive autocorrelation problem.There are techniques for handling autocorrelation problems, but they are beyond the scope of this course.

Page 34: Multiple Regression Presented By Muhammad Danish MBAEVE-IV

THANK YOU