modeling panel data - analytics, business intelligence … / 25 #analyticsx panel data panel data...

24
Modeling Panel Data: Choosing the Correct Strategy Roberto G. Gutierrez

Upload: hoanghanh

Post on 20-Jun-2018

242 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

Modeling Panel Data: Choosing the Correct StrategyRoberto G. Gutierrez

Page 2: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

2 / 25 #analyticsx

Overview

Panel data are ubiquitous in not only economics, but in all fields

Panel data have intrinsic modeling advantages

You model panel data in SAS with the PANEL procedure

Different model alternatives depending on assumptions and properties

Key new features in SAS/ETS 14.1

Page 3: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

3 / 25 #analyticsx

Panel Data

Panel data consist of a set of individuals measured over several points in time

Known by other names: longitudinal data, cross-sectional time series, clustered data,multilevel (two-level) data, etc.

Data collected in this manner offer key design advantages to modeling

The greatest advantage is that the individuals act as their own control group

Page 4: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

4 / 25 #analyticsx

Panel Regression Model

Formally, consider the linear regression model

yit D ˇ0 C ˇxXit C ˇzZi C �i C �it

for i D 1; : : : ;N individuals on t D 1; : : : ;Ti time periods.

The X variables vary over time

The Z variables are constant within individuals

The �i are individual (or cross section) effects

The �it are the observation-level errors

Different estimation strategies for what you are willing to assume about X, Z, �i , and �it

Page 5: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

5 / 25 #analyticsx

Grocery Data

Consumer-loyalty data from 330 households who shopped at a grocery chain in Raleigh,North Carolina

Monthly expenditures for the year 2011; some monthly data missingModel meat expenditures on the following factors:

I Received government assistance for that month?I Household sizeI Rural store location visited during the month?I Were at least 10% of total expenditures for alcohol?I The number of meals per week outside household, as provided on initial survey

Specifically, assess the association with government assistance controlling for the otherfactors, and for latent “household effects”

Page 6: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

6 / 25 #analyticsx

Data Statement

data Grocery;input HouseID Month Meat Govt Hsize Rural Alcohol MealsOut;

datalines;1 1 55.841 1 5 0 1 31 3 49.372 1 5 0 1 31 4 59.43 1 5 0 1 31 5 52.25 1 5 0 1 31 6 41.623 1 5 0 0 31 7 59.357 1 5 0 1 31 9 58.512 1 5 0 0 3

...330 9 55.264 1 4 0 0 2330 10 55.096 1 4 0 0 2330 12 49.676 1 4 0 0 2;

Page 7: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

7 / 25 #analyticsx

Random-Effects Estimation

Model is

Meatit D ˇ0 C ˇ1Govtit C ˇ2Hsizei C ˇ3Ruralit C

ˇ4Alcoholit C ˇ5MealsOuti C �i C �it

Random-effects estimation is the most common strategy

Treats the households as a random sample and the �i as uncorrelated with X, Z, and �it

A Hausman test is provided as a referendum on that assumption

Also known as generalized least squares (GLS)

Page 8: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

8 / 25 #analyticsx

Random-Effects Estimation

proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / ranone;

run;

Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: Meat (Meat purchases per store visit)

Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: Meat (Meat purchases per store visit)

Model Description

Estimation Method RanOne

Number of Cross Sections 330

Time Series Length 12

Fit Statistics

SSE 84930.9948 DFE 3567

MSE 23.8102 Root MSE 4.8796

R-Square 0.1232

Page 9: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

9 / 25 #analyticsx

Random-Effects Estimation

Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: Meat (Meat purchases per store visit)

Wansbeek and Kapteyn Variance Components (RanOne)

Dependent Variable: Meat (Meat purchases per store visit)

Variance Component Estimates

Variance Component for Cross Sections 190.123

Variance Component for Error 24.99832

Hausman Test for Random Effects

Coefficients DF m Value Pr > m

3 3 25.72 <.0001

Parameter Estimates

Variable DF EstimateStandard

Error t Value Pr > |t| Label

Intercept 1 20.50606 2.3327 8.79 <.0001 Intercept

Govt 1 5.050562 0.5989 8.43 <.0001 1 if used government assistance that month

Hsize 1 5.145648 0.4774 10.78 <.0001 Household size

Rural 1 -1.41068 0.3449 -4.09 <.0001 1 if rural location visited at least once

Alcohol 1 2.982397 0.1960 15.22 <.0001 1 if at least 10% spent on alcohol

MealsOut 1 -2.82761 0.3848 -7.35 <.0001 Meals per week outside of household (survey)

Page 10: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

10 / 25 #analyticsx

Correlated Individual Effects

The Hausman test puts the random-effects results into question

The problem is that the individual effects are likely correlated with the explanatory variables

This does not happen in designed experiments, but who has that these days?Does the regression coefficient on Govt represent

A. The effect of a household going on government assistance; orB. A comparison of two different households, one on government assistance throughout and one

not?

What do you want it to represent?

Page 11: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

11 / 25 #analyticsx

Correlated Individual Effects

Page 12: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

12 / 25 #analyticsx

Fixed-Effects Estimation

Fixed-effects estimation does not assume that individual effects are uncorrelated

It produces regression coefficients that are based solely on within-household comparisons

Equivalent to inserting a dummy regressor for each household

You lose some efficiency from not using any between-household data

Regressors are required to vary within households

Page 13: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

13 / 25 #analyticsx

Fixed-Effects Estimation

proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / fixone;

run;

Page 14: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

14 / 25 #analyticsx

Fixed-Effects Estimation

Fixed One-Way Estimates

Dependent Variable: Meat (Meat purchases per store visit)

Fixed One-Way Estimates

Dependent Variable: Meat (Meat purchases per store visit)

F Test for No Fixed Effects

Num DF Den DF F Value Pr > F

329 3240 32.06 <.0001

Parameter Estimates

Variable DF EstimateStandard

Error t Value Pr > |t| Label

Intercept 1 53.89442 1.6500 32.66 <.0001 Intercept

Govt 1 3.591205 0.6650 5.40 <.0001 1 if used government assistance that month

Hsize 0 0 . . . Household size

Rural 1 -1.45444 0.3578 -4.07 <.0001 1 if rural location visited at least once

Alcohol 1 2.992035 0.2013 14.86 <.0001 1 if at least 10% spent on alcohol

MealsOut 0 0 . . . Meals per week outside of household (survey)

Page 15: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

15 / 25 #analyticsx

Between-Groups Estimation

Rarely useful, put provided for comparison

proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / btwng;

run;

Page 16: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

16 / 25 #analyticsx

Between-Groups Estimation

Between-Groups Estimates

Dependent Variable: Meat (Meat purchases per store visit)

Between-Groups Estimates

Dependent Variable: Meat (Meat purchases per store visit)

Parameter Estimates

Variable DF EstimateStandard

Error t Value Pr > |t| Label

Intercept 1 16.98442 1.7004 9.99 <.0001 Intercept

Govt 1 13.40059 0.9886 13.56 <.0001 1 if used government assistance that month

Hsize 1 5.092447 0.3032 16.80 <.0001 Household size

Rural 1 0.005439 1.4038 0.00 0.9969 1 if rural location visited at least once

Alcohol 1 1.082457 1.7681 0.61 0.5408 1 if at least 10% spent on alcohol

MealsOut 1 -2.67669 0.2629 -10.18 <.0001 Meals per week outside of household (survey)

Page 17: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

17 / 25 #analyticsx

Hausman-Taylor Estimation

Random effects: All regressors uncorrelated with �i

Fixed effects: They might all be correlated

Hausman-Taylor: Why not stipulate some regressors as correlated, and have the best of bothworlds?

Choose correlated variables based on substantive knowledge, or guess; there’s a test for that

Estimation is done using instrumental variables, determined internally

This is a new feature of SAS/ETS 14.1

Page 18: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

18 / 25 #analyticsx

Hausman-Taylor Estimation

proc panel data = Grocery;id HouseID Month;instruments correlated = (Govt Mealsout);model Meat = Govt Hsize Rural Alcohol MealsOut / htaylor;

run;

Page 19: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

19 / 25 #analyticsx

Hausman-Taylor EstimationHausman and Taylor Model for Correlated Individual Effects (HTaylor)

Dependent Variable: Meat (Meat purchases per store visit)

Hausman and Taylor Model for Correlated Individual Effects (HTaylor)

Dependent Variable: Meat (Meat purchases per store visit)

Variance Component Estimates

Variance Component for Cross Sections 97.29627

Variance Component for Error 24.97519

Hausman Test against FixedEffects

Coefficients DF m Value Pr > m

3 1 0.76 0.3824

Parameter Estimates

Variable Type DF EstimateStandard

Error t Value Pr > |t| Label

Intercept 1 19.12589 2.4038 7.96 <.0001 Intercept

Govt C 1 3.583391 0.6649 5.39 <.0001 1 if used government assistance that month

Hsize TI 1 5.17389 0.3523 14.68 <.0001 Household size

Rural 1 -1.43991 0.3573 -4.03 <.0001 1 if rural location visited at least once

Alcohol 1 2.974996 0.2004 14.85 <.0001 1 if at least 10% spent on alcohol

MealsOut C TI 1 -1.92242 0.8090 -2.38 0.0175 Meals per week outside of household (survey)

C: correlated with the individual effectsTI: constant (time-invariant) within cross sections

Page 20: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

20 / 25 #analyticsx

The COMPARE Statement

The COMPARE statement is another new feature of PROC PANEL in SAS/ETS 14.1

Makes it easy to compare various models and estimators side by side

proc panel data = Grocery;id HouseID Month;instruments correlated = (Govt Mealsout);model Meat = Govt Hsize Rural Alcohol MealsOut /

ranone fixone btwng htaylor;compare;

run;

Page 21: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

21 / 25 #analyticsx

The COMPARE Statement

Model Comparison

Dependent Variable: Meat (Meat purchases per store visit)

Model Comparison

Dependent Variable: Meat (Meat purchases per store visit)

Comparison of Model Parameter Estimates

VariableModel 1FixOne

Model 1RanOne

Model 1HTaylor

Model 1BtwGrps

Intercept EstimateStd Err

53.8944151.649992

20.5060602.332669

19.1258952.403772

16.9844181.700415

Govt EstimateStd Err

3.5912050.665025

5.0505620.598942

3.5833910.664876

13.4005870.988573

Hsize EstimateStd Err

0.

5.1456480.477447

5.1738900.352344

5.0924470.303155

Rural EstimateStd Err

-1.4544390.357766

-1.4106800.344892

-1.4399050.357340

0.0054391.403805

Alcohol EstimateStd Err

2.9920350.201343

2.9823970.196014

2.9749960.200391

1.0824571.768137

MealsOut EstimateStd Err

0.

-2.8276080.384842

-1.9224210.808967

-2.6766940.262948

Page 22: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

22 / 25 #analyticsx

Other Capabilities

PROC PANEL can also do much more:

Two-way models

Dynamic-panel models

Adjustments for serial correlation, heteroscedasticity, and clustering

Unit root tests

Model specification tests (e.g. Durbin-Watson)

Page 23: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other

23 / 25 #analyticsx

Summary

Panel data offer modeling advantages

Use PROC PANEL in SAS/ETS for panel data regression

Many estimators available depending on assumptions

Correlated individual effects can be problematic, but there are solutions

New features in SAS/ETS 14.1

Page 24: Modeling Panel Data - Analytics, Business Intelligence … / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other