panel data (ch. 10) the recommended exercise …miniahn/ecn425/cn7.pdf · panel data (ch. 10) ......

Post on 07-Feb-2018

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PANEL DATA (Ch. 10) The recommended exercise questions from the textbook:

• Chapter 10: All except (10.6), (10.10).

[1] What are panel data?

• Panel data consists of the observations on the same n entities at

two or more time periods T. If the data set contains observations

on the variables X and Y, then the data are denoted

( , ), 1,..., 1,...,it itX Y i n and t T= = ,

where the first subscript, i, refers to the entity being observed, and

the second subscript, t, refers to the date at which it is observed.

• Balanced panel Vs. unbalanced panel.

• Balanced panel: Variables are observed for each entity and

each time period.

• Unbalanced panel: Some missing data for at least one time

period.

• We consider the analysis of balanced panel. But extension to

unbalanced is straightforward.

Panel-1

[2] Revisiting Omitted Variables Biases

• Issue:

• Do alcohol taxes help decrease traffic deaths?

• Data: fatality.wf1

• 48 U.S. states (excluding Alaska and Hawaii): N = 48.

• 1982 -1988: T =7.

• fatality rate = # of traffic accident deaths per 10,000 people.

beertax = tax per a case of beer ($).

• Estimation results for the 1982 data:

FatalityRate = 2.01 + 0.15BeerTax (0.15) (0.13) • Estimation results for the 1988 data: FatalityRate = 1.86 + 0.44BeerTax (0.11) (0.13)

Panel-2

Panel-3

• What is going on here?

• Consider a simple multiple regression model (for a given time t):

Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N,

where Zi is a time-invariant regressor.

1 • What do β1 and β2 measure?

β1 measures the partial effect of Xit on Yit with Zi held constant.

Similarly, β2 measures the partial effect of Zi on Yi with Xit held

constant.

• If you estimate Yit = α0 + α1Xit + errorit instead?

• 1 1 2cov( , )ˆ

var( )it i

pit

X ZX

α β β→ +

• Each state would have a different level of preference for alcohol

(say, Zi = Pal).

• Pal (Z) and Beertax (X) could be positively related: cov( , )it iX Z >0.

• Pal (Z) would have a positive partial effect on FatalityRate (β2 > 0).

• Thus, 1α̂ could be positive even if the true β1 is negative.

• How could we control Pal using panel data?

Panel-4

[3] Panel Data with Two Time Periods

• Two equations for 1982 and 1988:

FatalityRatei,1988 = β0 + β1BeerTaxi,1988 + β2Zi + ui,1988.

FatalityRatei,1982 = β0 + β1BeerTaxi,1982 + β2Zi + ui,1982.

→ FatalityRatei,1988 – Fatalityi,1982

= β1(BeerTaxi,1988 –BeerTaxi,1982) + (ui,1988-ui,1982). (1)

• No Zi in (1)! OLS on (1) will yield a consistent estimator of β1.

• Actual estimation results for (1):

1988 1982Fatality Fatality−

= -0.072 – 1.04(BeerTax1988 – BeerTax1982) (0.065) (0.36)

Panel-5

• Comments on the before-and-after estimation results.

• As real beer tax increases by $1 per case, the traffic fatality rate

falls by 1.04 deaths per 10,000 people.

→ This is a big effect, because mean traffic fatality rate is

approximately two.

• This before-and-after approach works well if T = 2. What should

we do if T > 2?

Panel-6

[4] Fixed Effects Regression

(A) A simple regression model:

Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N, t = 1, ... , T. (1)

• Set αi = β0 + β2Zi. Then, we have

Yit = β1Xit + αi + uit, (2)

which is called the “fixed effects regression model.

• For the i’th cross-sectional entity, the regression line is (2). The

slope coefficient β1 is the same for all i, but the intercept terms αi

are different across different i (but constant over time).

• Set:

Yit = β0 + β1Xit + γ2D2i + γ3D3i + ... + γnDni + uit, (3)

where i = 1, ... , n, t = 1, ..., T (nT observations),

1 22

0 ,i

i ;f i is the nd entityD

otherwise⎧

= ⎨⎩

and other dummy variables D3, ..., Dn are similarly defined.

• In (3), α1 = β0, α2 = β0 + γ2, ... , αn = β0 + γn.

• The slope coefficient β1 and n other parameters (β0, γ2, ..., γn) can

be estimated by OLS on model (3).

Panel-7

• “Entity-demeaned” OLS algorithm

• Yit = β1Xit + αi + uit

iY = β1 iX + αi + iu , where 11 T

i tY YT == Σ it .

------------------------------------

( ) ( ) ( )1it i it i it iY Y X X u uβ− = − + − . (4)

• OLS estimator of β1 from (4) = OLS estimator of β1 from (3).

• Least Square Assumptions for the fixed effects model:

(FEA.1) 1 2( | , ,..., , ) 0it i i iT iE u X X X α = .

(FEA.2) The data, 1 1( ,..., , ,..., )i iT i iTX X Y Y , i =1, ..., n, are random

sample.

(FEA.3) ( , )it iX α have nonzero finite fourth moments: Large

outliers are unlikely.

(FEA.4) There is no perfect multicollinearity.

(FEA.5) No autocorrelation: 1cov( , | ,..., , ) 0it is i iT iu u X X α = for all

. t s≠

For multiple regressions, Xit should be replaced by full list of X1,it,

…, Xk,it.

• What happens if (FEA.5) is violated?

Panel-8

(B) Extension to multiple X’s. • The fixed effects regression model is

Yit = β1X1,it + ... + βkXk,it + αi + uit, (5)

where i = 1, ... , n, and t = 1, ... , T.

• Equivalently, the fixed effects model can be written as

Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni + uit. (6)

• “Entity-demeaned” algorithm

( ) ( ) ( ) (1 1, 1, , ,...it i it i k k it k i it iY Y X X X X u uβ β− = − + + − + − ) . (7)

• OLS estimators of β1, ... , βk from (7) = OLS estimators of β1, ... ,

βk from (6).

(C) Application to Traffic Deaths.

• Fixed effects regression results:

FatalityRate = -0.66BeerTax + StateFixedEffects. (0.20)

Panel-9

[5] Time and Entity Fixed Effects Model

(1) Motivation.

• Return to our FatalityRate example:

Yit = β0 + β1Xit + β2Zi + β3St + uit,

where, Yit = FatalityRate; Xit = BeerTax;

Zi = time-invariant preferences for alcohol or driving of the

people in State i;

St = Time specific effects (common to all states) such as

overall mobile safety improvements.

• Let 1 ;

10, .t

if t is the first time periodB

otherwise⎧

= ⎨⎩

Define dummy variables B2t, ... , BTt similarly.

(2) Time and Entity Fixed Effects Model:

Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni

+ δ2B2t + ... δTBTt + uit.

• Too many regressors. But can get reasonably accurate estimates

of β1, ... , βk. But the estimates of γ2, ... , γn and δ2, ... , δT are

inaccurate.

(3) Application to traffic death

FatalityRate = -0.64Beertax + StateFixedEffects (0.25) + TimeFixedEffects.

Panel-10

[6] Drunk Driving Laws and Traffic Death

• Would driving laws and economic conditions matter?

Panel-11

• Drinking or drunken driving law do not matter very much.

• Economic factors are important.

• (4) is the base model.

• Average tax = $0.5/case,

and average fatality rate = 2 per 10,000 people.

• As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per

10,000).

→ But this result is somewhat imprecise: The confidence interval for

the effect of BeerTax at 95% of confidence level is:

→ (-0.88, -0.02), 0.45 1.96 0.22− ± ×

which is quite wide.

Panel-12

[7] Eviews Exercise

(1) Exercise with an artificial panel data set named “artificial_panel.xls.”

There are four variables in the excel file, “country”, “year”, “y”, and “x”. Each variable has 11 observations from the 3rd row to the 14th row. The data are artificial numbers for three countries, US, Japan and Korea. Notice that the variable “country” is alphabetic, not numeric. STEP 1: Open artificial_panel.xls using Excel. Then, using your mouse, block

the data and copy them. STEP 2: Open Eviews. Then, type the following on the Eviews window (the

narrow white window below the File, Edit, Object buttons):

create u 12 (enter)

Then, a workfile window will pop up.

Panel-13

Type the followings on the Eviews window:

alpha country (enter) data year y x (enter) The command “alpha” is used to create alphabetic variables, while “data” is for numeric variables.

Then, a spreadsheet will pop up.

Panel-14

Close the window by clicking on X on the North-East corner of the window. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button.

Panel-15

STEP 3: On the workfile, click on the show buttom. Then, a SHOW window

will pop up. Type on the window: country year y x

Panel-16

Click on OK. Then, a spreadsheet will pop up.

Panel-17

Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push the right button on your mouse.

Panel-18

Then, you will see that the data from the excel file are pasted to the

spreadsheet.

Panel-19

Close the spreadsheet by clicking on X on the North-East corner. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button.

STET 4: On the workfile, push the save buttom. Determine the drive and file

folder where you want to save the file. Choose the file name “artificial_panel.wf1”.

Panel-20

Click on the save button. Then, a “Workfile Save” window will pop

up. Just click on the ok button.

Panel-21

Then, you will be back to the workfile.

Panel-22

STEP 5: On the workfile, push the Proc button. Choose Structure/Resize

Current Page…

Panel-23

Then you will have the Workfile Structure window. Choose Dated

Panel. Then, you will have the following screen.

Panel-24

Type 2001 for Start date, 2004 for End date, country for Cross-

section ID series, and year for Data series. Then, click on OK.

Panel-25

Then, you will be back to the workfile. Save it!!! STEP 6: Push the objects/new object... button. Choose Equation and choose

art_pan as the name of the object. Then, an Equation Estimation

window will pop up. Type “y x” on the Equation specification box.

Panel-26

And click on Panel Options.

Panel-27

Choose “Fixed” for Cross-section, “Fixed” for Period, and “White

(diagonal) for Coef covariance method.

By choosing “Fixed” for Cross-section, you are doing regression with

dummy variables for individual entities. By choosing “Fixed” for

Period, you are adding time dummy variables into regression.

Panel-28

STEP 7: Choose view/Fixed/Random Effects/Cross-section Effects.

Then you will have:

Panel-29

Choose view/Fixed/Random Effects/Period Effects.

Panel-30

Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects.

Panel-31

Panel-32

I found that the F and χ2 statistics for the individual dummy variables and the

time dummy variables are computed assuming the error terms in the

regression models are homoskedastic over i and t. So, the results are not

reliable if the error terms are in fact heteroskedastic.

If you would like to test whether time effects are statistically significant,

I would like to suggest you to estimate your model choosing None for Period

but including time-dummy variables as time dummy variables.

Panel-33

(2) Exercise with fatality.wf1.

----------------------------------------------------------------------------------- variable name variable label ---------------------------------------------------------------------------------- state State ID (FIPS) Code year Year spircons Spirits Consumption unrate Unemployment Rate perinc Per Capita Personal Income emppop Employment/Population Ratio beertax Tax on Case of Beer sobapt % Southern Baptist mormon % Mormon mlda Minimum Legal Drinking Age dry % Residing in Dry Counties yngdrv % of Drivers Aged 15-24 vmiles Ave. Mile per Driver vmilespd Ave. Mile per 1,000 Driver breath Prelim. Breath Test Law jaild Mandatory Jail Sentence comserd Mandatory Community Service jailcom jaild + comserd allmort # of Vehicle Fatalities (#VF) mrall Vehicle Fatality Rate (VFR) = #VF/Population vfrall 10,000*mrall = VFR per 10,000 people allnite # of Night-time VF (#NVF) mralln Night-time VFR (NVFR) allsvn # of Single VF (#SVF) a1517 #NVF, 15-17 year olds mra1517n NVFR, 15-17 year olds a1829 #VF, 18-20 year olds a1820n #NVF, 18-20 year olds mra1820 VFR, 18-20 year olds mra1820n NVFR, 18-20 year olds a2124 #VF, 21-24 year olds mra2124 VFR, 21-24 year olds a2124n #NVF, 21-24 year olds mra2124n NVFR, 21-24 year olds aidall # of alcohol-involved VF

Panel-34

da18 Dummy variable for drinking age = 18 da19 Dummy variable for drinking age = 19 da20 Dummy variable for drinking age = 20 lincperc Log of per capita real income mraidall Alcohol-Involved VFR pop Population pop1517 Population, 15-17 year olds pop1820 Population, 18-20 year olds pop2124 Population, 21-24 year olds miles total vehicle miles (millions) unus U.S. unemployment rate epopus U.S. Emp/Pop Ratio gspch GSP Rate of Change Dum1982 Dum1983 Dum1984 : DUM1988 ------------------------------------------------------------------------------------

Panel-35

• Estimation of the specification (4) on Table 10.1 in p. 368. Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White diagonal standard errors & covariance (d.f. corrected)

Variable Coefficient Std. Error t-Statistic Prob.

C -2.327171 1.316419 -1.767804 0.0782BEERTAX -0.450272 0.222005 -2.028203 0.0435

DA18 0.027509 0.065473 0.420158 0.6747DA19 -0.019096 0.039510 -0.483315 0.6293DA20 0.030875 0.045689 0.675767 0.4998JAILD 0.012644 0.031940 0.395866 0.6925

COMSERD 0.034135 0.114820 0.297289 0.7665VMILESPD 0.008226 0.008368 0.983073 0.3264LINCPERC 1.814889 0.472220 3.843312 0.0002UNRATE -0.063043 0.011616 -5.427345 0.0000DUM1982 0.533926 0.075931 7.031706 0.0000DUM1983 0.435841 0.070418 6.189300 0.0000DUM1984 0.246723 0.050392 4.896067 0.0000DUM1985 0.155325 0.043688 3.555327 0.0004DUM1986 0.189843 0.040808 4.652090 0.0000DUM1987 0.087532 0.032452 2.697246 0.0074

Effects Specification

Cross-section fixed (dummy variables)

R-squared 0.939540 Mean dependent var 2.040444Adjusted R-squared 0.925809 S.D. dependent var 0.570194Log likelihood 183.8646 F-statistic 68.42532Durbin-Watson stat 1.733929 Prob(F-statistic) 0.000000

Panel-36

• Testing significance of the individual and time dummy variables:

[Estimation choosing “Fixed” for period and not using dummy variables as

regressor.]

Redundant Fixed Effects Tests Equation: MIN Test cross-section and period fixed effects

Effects Test Statistic d.f. Prob.

Cross-section F 44.772106 (47,273) 0.0000Cross-section Chi-square 727.186063 47 0.0000Period F 19.685127 (6,273) 0.0000Period Chi-square 120.798386 6 0.0000Cross-Section/Period F 40.398468 (53,273) 0.0000Cross-Section/Period Chi-square 732.351587 53 0.0000

Panel-37

• Testing significance of the time dummy variables:

[Estimation choosing “None” for period and using dummy variables as

regressor.]

Wald Test: Equation: MIN

Test Statistic Value df Probability

F-statistic 11.46715 (6, 273) 0.0000 Chi-square 68.80287 6 0.0000

Panel-38

Comments on (FEA.5):

• What if Assumption #5 fails: so corr(uit,uis|Xit,Xis,αi) ≠0?

• OLS panel data estimators of β1 are unbiased, consistent. • The OLS standard errors will be wrong. • Use “heteroskedasticity and autocorrelation-consistent standard

errors” (clustered standard errors). • The clustered SE formula is NOT the usual (hetero-robust) SE

formula! [Appendix 10.2 (pp. 379 – 381)]. • The clustered SE might not be very accurate if N is small. • Eviews can compute these!

• In Eviews, choose “White period” instead of “White (diagonal)”.

Panel-39

• Estimation of the specification (7) on Table 10.1 in p. 368.

Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White period standard errors & covariance (d.f. corrected)

Variable Coefficient Std. Error t-Statistic Prob.

C -2.327171 1.915400 -1.214979 0.2254BEERTAX -0.450272 0.319805 -1.407961 0.1603

DA18 0.027509 0.075267 0.365483 0.7150DA19 -0.019096 0.053288 -0.358351 0.7204DA20 0.030875 0.054076 0.570957 0.5685JAILD 0.012644 0.017699 0.714386 0.4756

COMSERD 0.034135 0.142797 0.239043 0.8113VMILESPD 0.008226 0.007355 1.118432 0.2644LINCPERC 1.814889 0.683535 2.655150 0.0084UNRATE -0.063043 0.013984 -4.508168 0.0000DUM1982 0.533926 0.098541 5.418291 0.0000DUM1983 0.435841 0.091540 4.761205 0.0000DUM1984 0.246723 0.064103 3.848852 0.0001DUM1985 0.155325 0.054832 2.832774 0.0050DUM1986 0.189843 0.042774 4.438265 0.0000DUM1987 0.087532 0.032445 2.697841 0.0074

Effects Specification

Cross-section fixed (dummy variables)

R-squared 0.939540 Mean dependent var 2.040444Adjusted R-squared 0.925809 S.D. dependent var 0.570194Durbin-Watson stat 1.733929 Prob(F-statistic) 0.000000

Panel-40

• Average tax = $0.5/case,

and average fatality rate = 2 per 10,000 people.

• As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per

10,000).

→ The confidence interval for the effect of BeerTax at 95% of

confidence level is:

→ (-1.08, 0.18), 0.45 1.96 0.32− ± ×

which is wider than (-0.88, -0.02).

Panel-41

Panel-42

Panel-43 Panel-43

top related