econ 140 lecture 171 multiple regression applications ii &iii lecture 17

32
Lecture 17 1 Econ 140 Econ 140 Multiple Regression Applications II &III Lecture 17

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 1

Econ 140Econ 140

Multiple Regression Applications II &III

Lecture 17

Page 2: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 2

Econ 140Econ 140Today’s Plan

• Two topics and how they relate to multiple regression

– Multicollinearity

– Dummy variables

Page 3: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 3

Econ 140Econ 140Multicollinearity

• Suppose we have the following regression equation:

Y = a + b1X1 + b2X2 + e

• Multicollinearity occurs when some or all of the independent X variables are linearly related

• Different forms of multicollinearity:

– Perfect: OLS estimation will not work

– Non-perfect: comes out of applied work

Page 4: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 4

Econ 140Econ 140Multicollinearity Example

• Again we’ll use returns to education where:

– the dependent variable Y is (log) wages

– the independent variables (X’s) are age, experience, and years of schooling

• Experience is defined as years in the labor force, or the difference between age and years of schooling

– this can be written: Experience = Age - Years of school

– What’s the problem with this?

Page 5: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 5

Econ 140Econ 140Multicollinearity Example (2)

• Note that we’ve expressed experience as the difference of two of our other independent variables

– by constructing experience in this manner we create a collinear dependence between age and experience

– the relationship between age and experience is a linear relationship such that: as age increases, for given years of schooling, experience also increases

• We can write our regression equation for this example:

Wages = a + b1Experience + b2Age + e

Page 6: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 6

Econ 140Econ 140Multicollinearity Example (3)

• Recall that our estimate for b1 is

22122

21

221221

1

xxxx

yxxxxyxb

Where x1 = experience and x2 = age

• The problem is that x1 and x2 are linearly related

– as we get closer to perfect linearity, the denominator will go to zero. – OLS won’t work!

Page 7: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 7

Econ 140Econ 140Multicollinearity Example (4)

• Recall that the estimated variance for is:

– So as x1 and x2 approach perfect collinearity, the denominator will go to zero and the expression for the the estimated variance of will increase

• Implications:– with multicollinearity, you will get large standard errors on partial coefficients– your t-ratios, given the null hypothesis that the value of the coefficient is zero, will be small

221

22

21

2222 ˆˆ 1

xxxx

xYXb

1b

1b

Page 8: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 8

Econ 140Econ 140More Multicollinearity Examples

• On L16_1.xls we have individual data on age, years of education, weekly earnings, school age, and experience– we can perform a regression to calculate returns given age and experience– we can also estimate bivariate models including only age, only experience, and only years of schooling– we expect that the problem is that experience is related to age (to test this, we can regress age on experience)

• if the slope coefficient on experience is 1, there is perfect multicollinearity

Page 9: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 9

Econ 140Econ 140More Multicollinearity Examples (2)

• On L16_2.xls there’s a made-up example of perfect multicollinearity– OLS is unable to calculate the slope coefficients– calculating the products and cross-products, we find that the denominator for the slope coefficients is zero as predicted– If we have is an applied problem with these properties:

1) OLS is still unbiased

2) Large variance, standard errors, and difficult hypothesis testing

3) Few significant coefficients but a high R2

Page 10: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 10

Econ 140Econ 140More Multicollinearity Examples (3)

• What to do with L16_2.xls?– There’s simply not enough variation– We can collect more data or rethink the model– We can test for partial correlations between the X variables (as demonstrated on L16_1.xls).

Page 11: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 11

Econ 140Econ 140Dummy variables

• Dummy variables allow you to include qualitative variables (or variables that otherwise cannot be quantified) in your regression– examples include: gender, race, marital status, and religion– also becomes important when looking at “regime shifts” which may be new policy initiatives, economic change, or seasonality

• We will look at some examples:– using female as a qualitative variable– using marital status as a qualitative variable– using the Phillips curve to demonstrate a regime shift

Page 12: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 12

Econ 140Econ 140Qualitative example: female

• We’ll construct a dummy variable:

Di = 0 if not female i = 1, …n

Di = 1 if female

– We can do this with any qualitative variable– Note: assigning the values for the dummy variable is an arbitrary choice

• On L17_1.xls there is a sample from the current CPS– to create the dummy variable “female” we assign the value one and zero to the CPS’ value of two and one for sex, respectively– we can include the dummy variable in the regression equation like we would any other variable

Page 13: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 13

Econ 140Econ 140Qualitative example: female (2)

• We estimate the following equation:

ii DY 485.0975.5ˆ • Now we can ask: what are the expected earnings given that a person is male?

• Similarly, what are the expected earnings given that a person is female?

E(Yi | Di = 1) = a + b(1) = a + b

= 5.975 - 0.485 = 5.490

975.50|

)0(0|

ii

ii

DYE

abaDYE

Page 14: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 14

Econ 140Econ 140Qualitative example: female (4)

• We can use other variables to extend our analysis• for example we can include age to get the equation:

Y = a + b1Di + b2Xi + e

– where Xi can be any or all relevant variables

– Di and the related coefficient b1 will indicate how much, on average, females earn less than males

– for males the intercept will be – for females the intercept will be

a

1ˆ ba

Page 15: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 15

Econ 140Econ 140Qualitative example: female (5)

• The estimated regression found on the spreadsheet is

• The expected weekly earnings for men are:

• The expected weekly earnings for women are:

iii XDY 023.0656.0085.5ˆ

iii XbaDYE 2)0|(

iii XbbaDYE 21)()1|(

Page 16: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 16

Econ 140Econ 140Qualitative example: female (6)

• An important note:• We can’t include dummy variables for both male and female in the same regression equation

– suppose we have Y = a + b1D1i + b2D2i + e

– where: D1i = 0 if male D1i = 1 if female

D2i = 0 if female D2i = 1 if male

– OLS won’t be able to estimate the regression coefficients because D 1i and D2i show perfect multicollinearity with intercept a

• So if you have m qualitative variables, you should include (m-1) dummy variables in the regression equation

Page 17: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 17

Econ 140Econ 140Example: marital status

• The spreadsheet (L17_1.xls) also estimates the following regression equation using two distinct dummy variables:

– where: D1i = 0 if male D1i = 1 if female

D2i = 0 if other D2i = 1 if married

• Using the regression equation we can create four categories: married males, unmarried males, married females, and unmarried females

eXbDbDbaY iii 32211

Page 18: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 18

Econ 140Econ 140Example: marital status (2)

• Expected earnings for unmarried males:

iiii XbaDDYE 321 )0,0|(

• Expected earnings for unmarried females:

iiii XbbaDDYE 3121 )()0,1|(

• Expected earnings for married males:

iiii XbbaDDYE 3221 )()1,0|(

• Expected earnings for unmarried females:

iiii XbbbaDDYE 32121 )()1,1|(

Page 19: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 19

Econ 140Econ 140Interactive terms

• So far we’ve only used dummy variables to change the intercept• We can also use dummy variables to alter the partial slope coefficients• Let’s think about this model:

Wt = a + b1Agei + b2Marriedi + e

– we could argue that would be different for males and females– we want to think about two sub-sample groups: males and females– we can test the hypothesis that the partial slope coefficients will be different for these 2 groups

abb ˆ and ˆ,ˆ21

Page 20: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 20

Econ 140Econ 140Interactive terms (2)

• To test our hypothesis we’ll estimate the regression equation for the whole sample and then for the two sub-sample groups• We test to see if our estimated coefficients are the same between males and females

• Our null hypothesis is:

H0 : aM, b1M, b2M = aF, b1F, b2F

Page 21: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 21

Econ 140Econ 140Interactive terms (3)

• We have an unrestricted form and a restricted form– unrestricted: used when we estimate for the sub-sample groups separately– restricted: used when we estimate for the whole sample

• What type of statistic will we use to carry out this test?– F-statistic:

knknSSR

qSSRSSRF

U

UR

21

q = k, the number of parameters in the model

n = n1 + n2 where n is complete sample size

Page 22: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 22

Econ 140Econ 140Interactive terms (4)

• The sum of squared residuals for the unrestricted form will be:

SSRU = SSRM + SSRF

• L17_2.xls

– the data is sorted according to the dummy variable “female”

– there is a second dummy variable for marital status

– there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample

Page 23: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 23

Econ 140Econ 140Interactive terms (5)

• The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the estimate:

626.2466.0

224.1

633093.5495.7

3093.5495.7261.1621

knknSSR

qSSRSSRF

U

UR

– Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females

Page 24: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 24

Econ 140Econ 140Interactive terms (6)

• What if F* > F0.05,3, 27 ? How to read the results?

– There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately

– Or we could interact the dummy variables with the other variables

• To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get:

Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei

Irene O. Wong:Irene O. Wong:Irene O. Wong:Irene O. Wong:

Page 25: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 25

Econ 140Econ 140Interactive terms (7)

• Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns– one way to see if the two sub-samples are different, look

at the t-ratios on the interactive terms– in this example, neither of the t-ratios are statistically

significant so we can’t reject the null hypothesis

• We now know how to use dummy variables to indicate the importance of sub-sample groups within the data– dummy variables are also useful for testing for structural

breaks or regime shifts

Page 26: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 26

Econ 140Econ 140Interactive terms (8)

• If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero:

E(Wt|Di = 0) = a + b1Agei + b2Marriedi

• We can do the same for the second sub-sample (Females)

E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi

• We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample

Page 27: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 27

Econ 140Econ 140Phillips Curve example

• Phillips curve as an example of a regime shift.

• Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment

W

UN

Page 28: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 28

Econ 140Econ 140Phillips Curve example (2)

• But if we look at data points from 1971 - 1996:

• From the data we can detect an upward sloping relationship

W

UN

Page 29: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 29

Econ 140Econ 140Phillips Curve example (3)

• There seems to be a regime shift between the two periods– note: this is an arbitrary choice of regime shift - it was not

dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift

– the test will use a restricted form:

– it will also use an unrestricted form:

– D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996

Nt U

baW1

NNt U

DbU

bDbaW11

321

Page 30: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 30

Econ 140Econ 140Phillips Curve example (4)

• L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test:

• The null hypothesis will be:

H0 : b1 = b3 = 0

– we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient

• The F-statistic is (* indicates restricted)

Where q=2 kne

qeeF

2

2*2

ˆ

ˆˆ

Page 31: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 31

Econ 140Econ 140Phillips Curve example (5)

• The expectation of wage inflation for the first time period:

• The expectation of wage inflation for the second time period:

• You can use the spreadsheet data to carry out these calculations

NUbaDWE

1)0|(

NU

bbbaDWE1

)1|( 321

Page 32: Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17

Lecture 17 32

Econ 140Econ 140What we’ve learned

• Multicollinearity

– linear relationship between independent variables

– examples

• Dummy variables

– way to include qualitative variables in regressions

– examples