christopher dougherty ec220 - introduction to econometrics (chapter 14) slideshow: regression...

34
Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 14). [Teaching Resource] © 2012 The Author This version available at: http://learningresources.lse.ac.uk/140/ Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/ http://learningresources.lse.ac.uk/

Upload: cristian-petre

Post on 28-Mar-2015

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Christopher Dougherty

EC220 - Introduction to econometrics (chapter 14)Slideshow: regression analysis with panel data

 

 

 

 

Original citation:

Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 14). [Teaching Resource]

© 2012 The Author

This version available at: http://learningresources.lse.ac.uk/140/

Available in LSE Learning Resources Online: May 2012

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/

 

 http://learningresources.lse.ac.uk/

Page 2: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

1

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 3: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

2

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 4: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years.

3

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 5: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years.

A balanced panel is one where every unit is surveyed in every time period. The NLSY is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died.

4

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 6: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Panel data sets have several advantages over cross-section data sets:

• They may make it possible to overcome a problem of bias caused

by unobserved heterogeneity.

5

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 7: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Panel data sets have several advantages over cross-section data sets:

• They may make it possible to overcome a problem of bias caused

by unobserved heterogeneity.

• They make it possible to investigate dynamics without relying on

retrospective questions that may yield data subject to measurement

error.

6

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 8: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Panel data sets have several advantages over cross-section data sets:

• They may make it possible to overcome a problem of bias caused

by unobserved heterogeneity.

• They make it possible to investigate dynamics without relying on

retrospective questions that may yield data subject to measurement

error.

• They are often very large. If there are n units and T time periods,

the potential number of observations is nT.

7

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 9: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Panel data sets have several advantages over cross-section data sets:

• They may make it possible to overcome a problem of bias caused

by unobserved heterogeneity.

• They make it possible to investigate dynamics without relying on

retrospective questions that may yield data subject to measurement

error.

• They are often very large. If there are n units and T time periods,

the potential number of observations is nT.

• Because they tend to be expensive to undertake, they are often well

designed and have high response rates. The NLSY is an example.

8

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 10: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

9

We will start with an example of the use of panel data to investigate simple dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in full-time employment.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 11: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

10

Here is the result of regressing the logarithm of hourly earnings on a dummy variable for being married and a set of control variables (years of schooling, ASVABC score, years of tenure and square, years of work experience and square, etc; coefficients not shown).

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 12: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

11

Married males earn 12.9 percent more than single males and the effect is highly significant (standard error in parentheses).

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 13: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

12

The effect has often been found in the literature. One explanation is that marriage entails financial responsibilities — in particular, the raising of children — that may encourage men to work harder or seek better paying jobs.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 14: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

13

Another is that certain unobserved qualities that are valued by employers are also valued by potential spouses and hence are conducive to getting married. According to this explanation the dummy variable for being married is acting as a proxy for these qualities.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 15: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

14

Other explanations have been proposed, but we will restrict attention to these two. With cross-sectional data it is difficult to discriminate between them.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 16: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

15

However with panel data one can find out whether there is an uplift at the time of marriage or soon after, as would be predicted by the increased productivity hypothesis ...

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 17: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

16

... or whether men who end up married tend to earn more even when unmarried, as would be predicted by the unobserved heterogeneity hypothesis.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 18: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

17

We define a second dummy variable SOONMARR equal to 1 if the respondent was single in 1988 but married within the next four years. The omitted category consists of those who were single in 1988 and still single four years later.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 19: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

18

Under the null hypothesis that the marital effect is dynamic and marriage encourages men to earn more, the coefficient of SOONMARR should be 0 because the men in this category were still single as of 1988.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 20: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

19

The t statistic is 3.10 and so it is significantly different from 0 at the 0.1 percent level, leading us to reject the null hypothesis at that level.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 21: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

20

However, if the alternative hypothesis is true, the coefficient of SOONMARR should be equal to that of MARRIED, but it is lower.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 22: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

21

To test whether it is significantly lower, the easiest method is to change the reference category to those who were married by 1988 and to introduce a new dummy variable SINGLE that is equal to 1 if the respondent was still single four years later.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

Page 23: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

22

The coefficient of SOONMARR now estimates the difference between the coefficients of those married by 1988 and those married within the next four years, and if the second hypothesis is true, it should be equal to 0.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

Page 24: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

NLSY 1988 dataDependent variable LGEARN

MARRIED 0.129 0.163 –(0.024) (0.028)

SOONMARR – 0.096 –0.066(0.037) (0.034)

SINGLE – – –0.163(0.028)

R2 0.271 0.274 0.274

n 1538 1538 1538

23

The t statistic is –1.93, so we (just) do not reject the second hypothesis at the 5 percent level. The evidence is more compatible wtih the first hypothesis, but it is possible that neither hypothesis is correct on its own and the truth might reside in some compromise.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Page 25: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

24

The starting point for a discussion of regression models using panel data is an equation of the form shown above, where the Xj variables are observed and the Zp variables are unobserved.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 26: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

25

The index i refers to the unit of observation, t refers to the time period, and j and p are used to differentiate between different observed and unobserved explanatory variables. it is a disturbance term assumed to satisfy the regression model assumptions.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 27: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

26

A trend term t has been introduced to allow for a shift of the intercept over time. If the implicit assumption of a constant rate of change seems too strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 28: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

27

The Xj variables are usually the variables of interest, while the Zp variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 29: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

28

Note that the unobserved heterogeneity is assumed to be unchanging and accordingly the Zp variables do not have a time subscript.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 30: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

29

s

ppipi Z

1

Because the Zp variables are unobserved, there is no means of obtaining information about the pZp component of the model and it is convenient to define a term i, known as the unobserved effect, representing the joint impact of the Zp variables on Yi.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

it

s

ppip

k

jjitjit tZXY

121

Page 31: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Hence we can rewrite the regression model as shown. The characterization of the i component will be seen to be crucially important in what follows.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

s

ppipi Z

1

iti

k

jjitjit tXY

21

it

s

ppip

k

jjitjit tZXY

121

30

Page 32: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

First, however, note that if the Xj controls are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

s

ppipi Z

1

iti

k

jjitjit tXY

21

it

s

ppip

k

jjitjit tZXY

121

31

Page 33: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

In that case the i term may be dropped and pooled OLS may be used to fit the model, treating all the observations for all of the time periods as a single sample.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

s

ppipi Z

1

iti

k

jjitjit tXY

21

it

s

ppip

k

jjitjit tZXY

121

32

Page 34: Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C

Copyright Christopher Dougherty 2011.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 14.1 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own and who feel that they might

benefit from participation in a formal course should consider the London School

of Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

20 Elements of Econometrics

www.londoninternational.ac.uk/lse.

11.07.25