1 proxy variables suppose that a variable y is hypothesized to depend on a set of explanatory...

27
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2 , ..., X k as shown above, and suppose that for some reason there are no data on X 2 . u X X X Y k k ... 3 3 2 2 1

Upload: gary-green

Post on 25-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

1

PROXY VARIABLES

Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X2, ..., Xk as shown above, and suppose that for some reason there are no data on X2.

uXXXY kk ...33221

Page 2: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

2

As we have seen, a regression of Y on X3, ..., Xk would yield biased estimates of the coefficients and invalid standard errors and tests.

PROXY VARIABLES

uXXXY kk ...33221

Page 3: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

3

Sometimes, however, these problems can be reduced or eliminated by using a proxy variable in the place of X2. A proxy variable is one that is hypothesized to be linearly related to the missing variable. In the present example, Z could act as a proxy for X2.

PROXY VARIABLES

uXXXY kk ...33221

ZX 2

Page 4: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

4

The validity of the proxy relationship must be justified on the basis of theory, common sense, or experience. It cannot be checked directly because there are no data on X2.

PROXY VARIABLES

uXXXY kk ...33221

ZX 2

Page 5: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

5

If a suitable proxy has been identified, the regression model can be rewritten as shown.

PROXY VARIABLES

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 6: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

6

We thus obtain a model with all variables observable. If the proxy relationship is an exact one, and we fit this relationship, most of the regression results will be rescued.

PROXY VARIABLES

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 7: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

7

The estimates of the coefficients of X3, ..., Xk will be the same as those that would have been obtained if it had been possible to regress Y on X2, ..., Xk.

PROXY VARIABLES

Comparison of regression with Z instead of X2

1. b3, ..., bk same

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 8: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

8

The standard errors and t statistics of the coefficients of X3, ..., Xk will be the same as those that would have been obtained if it had been possible to regress Y on X2, ..., Xk.

PROXY VARIABLES

2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 9: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

9

R2 will be the same as it would have been if it had been possible to regress Y on X2, ..., Xk.

PROXY VARIABLES

3. R2 same2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 10: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

10

The coefficient of Z will be an estimate of 2, and so it will not be possible to obtain an estimate of 2, unless you are able to guess the value of .

PROXY VARIABLES

3. R2 same2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

4. Not possible to obtain an estimate of 2, unless known

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 11: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

11

However the t statistic for Z will be the same as that which would have been obtained for X2 if it had been possible to regress Y on X2, ..., Xk, and so you are able to assess the significance of X2, even if you are not able to estimate its coefficient.

PROXY VARIABLES

5. t statistic for Z same as that for X2

3. R2 same2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

4. Not possible to obtain an estimate of 2, unless known

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 12: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

12

It will not be possible to obtain an estimate of 1 since the intercept in the revised model is (1+2), but usually 1 is of relatively little interest, anyway.

PROXY VARIABLES

5. t statistic for Z same as that for X2

3. R2 same2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

4. Not possible to obtain an estimate of 2, unless known

6. Not possible to obtain an estimate of 1

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Page 13: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

13

It is generally more realistic to hypothesize that the relationship between X2 and Z is approximate, rather than exact. In that case the results listed above will hold approximately.

PROXY VARIABLES

5. t statistic for Z same as that for X2

3. R2 same2. S.e. and t for b3, ..., bk same

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

Comparison of regression with Z instead of X2

1. b3, ..., bk same

4. Not possible to obtain an estimate of 2, unless known

6. Not possible to obtain an estimate of 1

(approximation)

(approximations)

Page 14: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

14

However, if Z is a poor proxy for X2, the results will effectively be subject to measurement error (see Chapter 8). Further, it is possible that some of the other X variables will try to act as proxies for X2, and there will still be a problem of omitted variable bias.

PROXY VARIABLES

5. t statistic for Z same as that for X2

3. R2 same2. S.e. and t for b3, ..., bk same

Comparison of regression with Z instead of X2

1. b3, ..., bk same

4. Not possible to obtain an estimate of 2, unless known

6. Not possible to obtain an estimate of 1

(approximations)

uXXXY kk ...33221

ZX 2

uXXZ

uXXZY

kk

kk

...

...

33221

3321

(approximation)

Page 15: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

15

The use of a proxy variable will be illustrated with an educational attainment model. We will suppose that educational attainment depends jointly on cognitive ability and family background.

PROXY VARIABLES

uINDEXASVABCS 321

Page 16: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

16

As usual, ASVABC will be used as the measure of cognitive ability. However, there is no ‘family background’ variable in the data set. Indeed, it is difficult to conceive how such a variable might be defined.

PROXY VARIABLES

uINDEXASVABCS 321

Page 17: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

17

Instead, we will try to find a proxy. One obvious variable is the mother's educational attainment, SM. However, father's educational attainment, SF, may also be relevant. So we will hypothesize that the family background index depends on both.

PROXY VARIABLES

uINDEXASVABCS 321

SFSMINDEX 21

Page 18: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

18

Thus we obtain a relationship expressing S as a function of ASVABC, SM, and SF.

uINDEXASVABCS 321

SFSMINDEX 21

uSFSMASVABC

uSFSMASVABCS

2313231

21321

PROXY VARIABLES

Page 19: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681------------------------------------------------------------------------------

19

Here is the corresponding regression using EAEF Data Set 21.

PROXY VARIABLES

Page 20: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376-------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036------------------------------------------------------------------------------

20

Here is the regression of S on ASVABC alone.

PROXY VARIABLES

Page 21: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681------------------------------------------------------------------------------

21

A comparison of the regressions indicates that the coefficient of ASVABC is biased upwards if we make no attempt to control for family background.

PROXY VARIABLES

. reg S ASVABC

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036------------------------------------------------------------------------------

Page 22: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681------------------------------------------------------------------------------

. reg S ASVABC

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036------------------------------------------------------------------------------

22

This is what we should expect. Both SM and SF are likely to have positive effects on educational attainment, and they are both positively correlated with ASVABC.

. cor ASVABC SM SF(obs=570)

| ASVABC SM SF--------+--------------------------- ASVABC| 1.0000 SM| 0.4202 1.0000 SF| 0.4090 0.6241 1.0000

PROXY VARIABLES

Page 23: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF LIBRARY SIBLINGS

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718-------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1245327 .0099875 12.47 0.000 .104913 .1441523 SM | .0388414 .039969 0.97 0.332 -.0396743 .1173571 SF | .1035001 .0311842 3.32 0.001 .0422413 .1647588 LIBRARY | -.0355224 .2134634 -0.17 0.868 -.4548534 .3838086 SIBLINGS | -.0665348 .0408795 -1.63 0.104 -.1468392 .0137696 _cons | 5.846517 .5681221 10.29 0.000 4.730489 6.962546------------------------------------------------------------------------------

23

LIBRARY (a dummy variable equal to 1 if anyone in the family owned a library card when the respondent was 14) and SIBLINGS (number of brothers and sisters of the respondent) are two other variables in the data set which might act as proxies for family background.

PROXY VARIABLES

Page 24: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF LIBRARY SIBLINGS

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718-------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1245327 .0099875 12.47 0.000 .104913 .1441523 SM | .0388414 .039969 0.97 0.332 -.0396743 .1173571 SF | .1035001 .0311842 3.32 0.001 .0422413 .1647588 LIBRARY | -.0355224 .2134634 -0.17 0.868 -.4548534 .3838086 SIBLINGS | -.0665348 .0408795 -1.63 0.104 -.1468392 .0137696 _cons | 5.846517 .5681221 10.29 0.000 4.730489 6.962546------------------------------------------------------------------------------

24

The LIBRARY variable was one of three variables included in the National Longitudinal Survey of Youth to help pick up the influence of family background on education. Surprisingly, it has a negative coefficient, but it is not significant.

PROXY VARIABLES

Page 25: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

. reg S ASVABC SM SF LIBRARY SIBLINGS

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718-------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1245327 .0099875 12.47 0.000 .104913 .1441523 SM | .0388414 .039969 0.97 0.332 -.0396743 .1173571 SF | .1035001 .0311842 3.32 0.001 .0422413 .1647588 LIBRARY | -.0355224 .2134634 -0.17 0.868 -.4548534 .3838086 SIBLINGS | -.0665348 .0408795 -1.63 0.104 -.1468392 .0137696 _cons | 5.846517 .5681221 10.29 0.000 4.730489 6.962546------------------------------------------------------------------------------

25

There is a tendency for parents who are ambitious for their children to limit their number, so SIBLINGS should be expected to have a negative coefficient. It does, but it is also not significant.

PROXY VARIABLES

Page 26: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

There are further background variables which may be relevant for educational attainment: faith, ethnicity, and region of residence. These variables are supplied in the data set, but it will be left to you to experiment with them.

26

PROXY VARIABLES

. reg S ASVABC SM SF LIBRARY SIBLINGS

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718-------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1245327 .0099875 12.47 0.000 .104913 .1441523 SM | .0388414 .039969 0.97 0.332 -.0396743 .1173571 SF | .1035001 .0311842 3.32 0.001 .0422413 .1647588 LIBRARY | -.0355224 .2134634 -0.17 0.868 -.4548534 .3838086 SIBLINGS | -.0665348 .0408795 -1.63 0.104 -.1468392 .0137696 _cons | 5.846517 .5681221 10.29 0.000 4.730489 6.962546------------------------------------------------------------------------------

Page 27: 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 6.5 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit

from participation in a formal course should consider the London School of

Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

EC2020 Elements of Econometrics

www.londoninternational.ac.uk/lse.

2012.11.09