modeling panel data - analytics, business intelligence … / 25 #analyticsx panel data panel data...
TRANSCRIPT
Modeling Panel Data: Choosing the Correct StrategyRoberto G. Gutierrez
2 / 25 #analyticsx
Overview
Panel data are ubiquitous in not only economics, but in all fields
Panel data have intrinsic modeling advantages
You model panel data in SAS with the PANEL procedure
Different model alternatives depending on assumptions and properties
Key new features in SAS/ETS 14.1
3 / 25 #analyticsx
Panel Data
Panel data consist of a set of individuals measured over several points in time
Known by other names: longitudinal data, cross-sectional time series, clustered data,multilevel (two-level) data, etc.
Data collected in this manner offer key design advantages to modeling
The greatest advantage is that the individuals act as their own control group
4 / 25 #analyticsx
Panel Regression Model
Formally, consider the linear regression model
yit D ˇ0 C ˇxXit C ˇzZi C �i C �it
for i D 1; : : : ;N individuals on t D 1; : : : ;Ti time periods.
The X variables vary over time
The Z variables are constant within individuals
The �i are individual (or cross section) effects
The �it are the observation-level errors
Different estimation strategies for what you are willing to assume about X, Z, �i , and �it
5 / 25 #analyticsx
Grocery Data
Consumer-loyalty data from 330 households who shopped at a grocery chain in Raleigh,North Carolina
Monthly expenditures for the year 2011; some monthly data missingModel meat expenditures on the following factors:
I Received government assistance for that month?I Household sizeI Rural store location visited during the month?I Were at least 10% of total expenditures for alcohol?I The number of meals per week outside household, as provided on initial survey
Specifically, assess the association with government assistance controlling for the otherfactors, and for latent “household effects”
6 / 25 #analyticsx
Data Statement
data Grocery;input HouseID Month Meat Govt Hsize Rural Alcohol MealsOut;
datalines;1 1 55.841 1 5 0 1 31 3 49.372 1 5 0 1 31 4 59.43 1 5 0 1 31 5 52.25 1 5 0 1 31 6 41.623 1 5 0 0 31 7 59.357 1 5 0 1 31 9 58.512 1 5 0 0 3
...330 9 55.264 1 4 0 0 2330 10 55.096 1 4 0 0 2330 12 49.676 1 4 0 0 2;
7 / 25 #analyticsx
Random-Effects Estimation
Model is
Meatit D ˇ0 C ˇ1Govtit C ˇ2Hsizei C ˇ3Ruralit C
ˇ4Alcoholit C ˇ5MealsOuti C �i C �it
Random-effects estimation is the most common strategy
Treats the households as a random sample and the �i as uncorrelated with X, Z, and �it
A Hausman test is provided as a referendum on that assumption
Also known as generalized least squares (GLS)
8 / 25 #analyticsx
Random-Effects Estimation
proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / ranone;
run;
Wansbeek and Kapteyn Variance Components (RanOne)
Dependent Variable: Meat (Meat purchases per store visit)
Wansbeek and Kapteyn Variance Components (RanOne)
Dependent Variable: Meat (Meat purchases per store visit)
Model Description
Estimation Method RanOne
Number of Cross Sections 330
Time Series Length 12
Fit Statistics
SSE 84930.9948 DFE 3567
MSE 23.8102 Root MSE 4.8796
R-Square 0.1232
9 / 25 #analyticsx
Random-Effects Estimation
Wansbeek and Kapteyn Variance Components (RanOne)
Dependent Variable: Meat (Meat purchases per store visit)
Wansbeek and Kapteyn Variance Components (RanOne)
Dependent Variable: Meat (Meat purchases per store visit)
Variance Component Estimates
Variance Component for Cross Sections 190.123
Variance Component for Error 24.99832
Hausman Test for Random Effects
Coefficients DF m Value Pr > m
3 3 25.72 <.0001
Parameter Estimates
Variable DF EstimateStandard
Error t Value Pr > |t| Label
Intercept 1 20.50606 2.3327 8.79 <.0001 Intercept
Govt 1 5.050562 0.5989 8.43 <.0001 1 if used government assistance that month
Hsize 1 5.145648 0.4774 10.78 <.0001 Household size
Rural 1 -1.41068 0.3449 -4.09 <.0001 1 if rural location visited at least once
Alcohol 1 2.982397 0.1960 15.22 <.0001 1 if at least 10% spent on alcohol
MealsOut 1 -2.82761 0.3848 -7.35 <.0001 Meals per week outside of household (survey)
10 / 25 #analyticsx
Correlated Individual Effects
The Hausman test puts the random-effects results into question
The problem is that the individual effects are likely correlated with the explanatory variables
This does not happen in designed experiments, but who has that these days?Does the regression coefficient on Govt represent
A. The effect of a household going on government assistance; orB. A comparison of two different households, one on government assistance throughout and one
not?
What do you want it to represent?
11 / 25 #analyticsx
Correlated Individual Effects
12 / 25 #analyticsx
Fixed-Effects Estimation
Fixed-effects estimation does not assume that individual effects are uncorrelated
It produces regression coefficients that are based solely on within-household comparisons
Equivalent to inserting a dummy regressor for each household
You lose some efficiency from not using any between-household data
Regressors are required to vary within households
13 / 25 #analyticsx
Fixed-Effects Estimation
proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / fixone;
run;
14 / 25 #analyticsx
Fixed-Effects Estimation
Fixed One-Way Estimates
Dependent Variable: Meat (Meat purchases per store visit)
Fixed One-Way Estimates
Dependent Variable: Meat (Meat purchases per store visit)
F Test for No Fixed Effects
Num DF Den DF F Value Pr > F
329 3240 32.06 <.0001
Parameter Estimates
Variable DF EstimateStandard
Error t Value Pr > |t| Label
Intercept 1 53.89442 1.6500 32.66 <.0001 Intercept
Govt 1 3.591205 0.6650 5.40 <.0001 1 if used government assistance that month
Hsize 0 0 . . . Household size
Rural 1 -1.45444 0.3578 -4.07 <.0001 1 if rural location visited at least once
Alcohol 1 2.992035 0.2013 14.86 <.0001 1 if at least 10% spent on alcohol
MealsOut 0 0 . . . Meals per week outside of household (survey)
15 / 25 #analyticsx
Between-Groups Estimation
Rarely useful, put provided for comparison
proc panel data = Grocery;id HouseID Month;model Meat = Govt Hsize Rural Alcohol MealsOut / btwng;
run;
16 / 25 #analyticsx
Between-Groups Estimation
Between-Groups Estimates
Dependent Variable: Meat (Meat purchases per store visit)
Between-Groups Estimates
Dependent Variable: Meat (Meat purchases per store visit)
Parameter Estimates
Variable DF EstimateStandard
Error t Value Pr > |t| Label
Intercept 1 16.98442 1.7004 9.99 <.0001 Intercept
Govt 1 13.40059 0.9886 13.56 <.0001 1 if used government assistance that month
Hsize 1 5.092447 0.3032 16.80 <.0001 Household size
Rural 1 0.005439 1.4038 0.00 0.9969 1 if rural location visited at least once
Alcohol 1 1.082457 1.7681 0.61 0.5408 1 if at least 10% spent on alcohol
MealsOut 1 -2.67669 0.2629 -10.18 <.0001 Meals per week outside of household (survey)
17 / 25 #analyticsx
Hausman-Taylor Estimation
Random effects: All regressors uncorrelated with �i
Fixed effects: They might all be correlated
Hausman-Taylor: Why not stipulate some regressors as correlated, and have the best of bothworlds?
Choose correlated variables based on substantive knowledge, or guess; there’s a test for that
Estimation is done using instrumental variables, determined internally
This is a new feature of SAS/ETS 14.1
18 / 25 #analyticsx
Hausman-Taylor Estimation
proc panel data = Grocery;id HouseID Month;instruments correlated = (Govt Mealsout);model Meat = Govt Hsize Rural Alcohol MealsOut / htaylor;
run;
19 / 25 #analyticsx
Hausman-Taylor EstimationHausman and Taylor Model for Correlated Individual Effects (HTaylor)
Dependent Variable: Meat (Meat purchases per store visit)
Hausman and Taylor Model for Correlated Individual Effects (HTaylor)
Dependent Variable: Meat (Meat purchases per store visit)
Variance Component Estimates
Variance Component for Cross Sections 97.29627
Variance Component for Error 24.97519
Hausman Test against FixedEffects
Coefficients DF m Value Pr > m
3 1 0.76 0.3824
Parameter Estimates
Variable Type DF EstimateStandard
Error t Value Pr > |t| Label
Intercept 1 19.12589 2.4038 7.96 <.0001 Intercept
Govt C 1 3.583391 0.6649 5.39 <.0001 1 if used government assistance that month
Hsize TI 1 5.17389 0.3523 14.68 <.0001 Household size
Rural 1 -1.43991 0.3573 -4.03 <.0001 1 if rural location visited at least once
Alcohol 1 2.974996 0.2004 14.85 <.0001 1 if at least 10% spent on alcohol
MealsOut C TI 1 -1.92242 0.8090 -2.38 0.0175 Meals per week outside of household (survey)
C: correlated with the individual effectsTI: constant (time-invariant) within cross sections
20 / 25 #analyticsx
The COMPARE Statement
The COMPARE statement is another new feature of PROC PANEL in SAS/ETS 14.1
Makes it easy to compare various models and estimators side by side
proc panel data = Grocery;id HouseID Month;instruments correlated = (Govt Mealsout);model Meat = Govt Hsize Rural Alcohol MealsOut /
ranone fixone btwng htaylor;compare;
run;
21 / 25 #analyticsx
The COMPARE Statement
Model Comparison
Dependent Variable: Meat (Meat purchases per store visit)
Model Comparison
Dependent Variable: Meat (Meat purchases per store visit)
Comparison of Model Parameter Estimates
VariableModel 1FixOne
Model 1RanOne
Model 1HTaylor
Model 1BtwGrps
Intercept EstimateStd Err
53.8944151.649992
20.5060602.332669
19.1258952.403772
16.9844181.700415
Govt EstimateStd Err
3.5912050.665025
5.0505620.598942
3.5833910.664876
13.4005870.988573
Hsize EstimateStd Err
0.
5.1456480.477447
5.1738900.352344
5.0924470.303155
Rural EstimateStd Err
-1.4544390.357766
-1.4106800.344892
-1.4399050.357340
0.0054391.403805
Alcohol EstimateStd Err
2.9920350.201343
2.9823970.196014
2.9749960.200391
1.0824571.768137
MealsOut EstimateStd Err
0.
-2.8276080.384842
-1.9224210.808967
-2.6766940.262948
22 / 25 #analyticsx
Other Capabilities
PROC PANEL can also do much more:
Two-way models
Dynamic-panel models
Adjustments for serial correlation, heteroscedasticity, and clustering
Unit root tests
Model specification tests (e.g. Durbin-Watson)
23 / 25 #analyticsx
Summary
Panel data offer modeling advantages
Use PROC PANEL in SAS/ETS for panel data regression
Many estimators available depending on assumptions
Correlated individual effects can be problematic, but there are solutions
New features in SAS/ETS 14.1