linear regression slides
TRANSCRIPT
-
7/29/2019 Linear Regression Slides
1/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Linear Regression
Michael R. Roberts
Department of FinanceThe Wharton School
University of Pennsylvania
October 5, 2009
Michael R. Roberts Linear Regression 1/129
-
7/29/2019 Linear Regression Slides
2/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Motivation
Linear regression is arguably the most popular modeling approachacross every field in the social sciences.
1 Very robust technique2 Linear regression also provides a basis for more advanced empirical
methods.3 Transparent and relatively easy to understand technique4 Useful for both descriptive and structural analysis
Were going to learn linear regression inside and out from an appliedperspective
focusing on the appropriateness of different assumptions, modelbuilding, and interpretation
This lecture draws heavily from Wooldridges undergraduate andgraduate texts, as well as Greenes graduate text.
Michael R. Roberts Linear Regression 2/129
-
7/29/2019 Linear Regression Slides
3/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Terminology
The simple linear regression model (a.k.a. - bivariate linearregression model, 2-variable linear regression model)
y = + x + u (1)
y = dependent variable, outcome variable, response variable,explained variable, predicted variable, regressand
x = independent variable, explanatory variable, controlvariable, predictor variable, regressor, covariate
u = error term, disturbance
= intercept parameter
= slope parameter
Michael R. Roberts Linear Regression 3/129
-
7/29/2019 Linear Regression Slides
4/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Details
Recall model is
y = + x + u
(y, x, u) are random variables(y, x) are observable (we can sample observations on them)
u is unobservable = no stat tests involving u(, ) are unobserved but estimable under certain conds
Model implies that u captures everything that determines y exceptfor x
In natural sciences, this often includes frictions, air resistance, etc.
In social sciences, this often includes a lot of stuff!!!
Michael R. Roberts Linear Regression 4/129
U i i R i B i
-
7/29/2019 Linear Regression Slides
5/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Assumptions
1 E(u) = 0As long as we have an intercept, this assumption is innocuousImagine E(u) = k = 0. We can rewrite u = k + w =
yi = ( + k) + E(xi) + w
where E() = 0. Any non-zero mean is absorbed by the intercept.
2 E(u|x) = E(u)Assuming q u (= orthogonal) is not enough! Correlation onlymeasures linear dependence
Conditional mean independenceImplied by full independence q u (= independent)Implies uncorrelatedIntuition: avg of u does not depend on the value of qCan combine with zero mean assumption to get zero conditionalmean assumption E(u
|q) = E(u) = 0
Michael R. Roberts Linear Regression 5/129
U i i t R i B i
-
7/29/2019 Linear Regression Slides
6/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Conditional Mean Independence (CMI)
This is the key assumption in most applications
Can we test it?
Run regression.
Take residuals u = y y & see if avg u at each value of x = 0?Or, see if residuals are uncorrelated with xDoes these exercise make sense?
Can we think about it?
The assumption says that no matter whether x is low, medium, or
high, the unexplained portion of y is, on average, the same (0).But, what if agents (firms, etc.) with different values of x aredifferent along other dimensions that matter for y?
Michael R. Roberts Linear Regression 6/129
Uni ariate Regression Basics
-
7/29/2019 Linear Regression Slides
7/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
CMI Example 1: Capital Structure
Consider the regression
Leveragei = + Profitabilityi + ui
CMI = that average u for each level of Profitability is the sameBut, unprofitable firms tend to have higher bankruptcy risk and
should have lower leverage than more profitable firms according totradeoff theoryOr, unprofitable firms have accumulated fewer profits and may beforced to debt financing, implying higher leverage according to thepecking orderThese e.g.s show that the average u is likely to vary with the levelof profitability
1st e.g., low profitable firms will be less levered implies lower avg ufor less profitable firms2nd e.g., low profitable firms will be more levered implies higher avg
u for less profitable firmsMichael R. Roberts Linear Regression 7/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
8/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
CMI Example 2: Investment
Consider the regression
Investmenti = + qi + ui
CMI =
that average u for each level of q is the same
But, firms with low q may be in distress and invest less
Or, firms with high q may have difficultly raising sufficient capital tofinance their investment
These e.g.s show that the average u is likely to vary with the levelof q
1st e.g., low q firms will invest less implies higher avg u for low qfirms2nd e.g., high q firms will invest less implies higher avg u for low qfirms
Michael R. Roberts Linear Regression 8/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
9/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Population Regression Function (PRF)
PRF is E(y|x). It is fixed but unknown. For simple linear regression:PRF = E(y|x) = + x (2)
Intuition: for any value of x, distribution of y is centered about
E(y|x)
Michael R. Roberts Linear Regression 9/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
10/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
OLS Regression Line
We dont observe PRF, but we can estimate via OLS
yi = + xi + ui (3)
for each sample point iWhat is ui? It contains all of the factors affecting yi other than xi.
= ui contains a lot of stuff! Consider complexity ofy is individual food expendituresy is corporate leverage ratiosy is interest rate spread on a bond
Estimated Regression Line (a.k.a. Sample Regression Function
(SRF))
y = + x (4)
Plug in an x and out comes an estimate of y, y
Note: Different sample =
different (, )
Michael R. Roberts Linear Regression 10/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
11/129
Uni ariate RegressionMultivariate Regression
Specification IssuesInference
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
OLS Estimates
Estimators:
Slope = = Ni=1(xi x)(yi y)
Ni=1(xi x)2Intercept = = y x
Population analogues
Slope = Cov(x, y)Var(x)
= Corr(x, y) SD(y)SD(x)
Intercept = E(y) E(x)
Michael R. Roberts Linear Regression 11/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
12/129
gMultivariate Regression
Specification IssuesInference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
The Picture
Michael R. Roberts Linear Regression 12/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
13/129
gMultivariate Regression
Specification IssuesInference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Example: CEO Compensation
Model
salary = + ROE + y
Sample 209 CEOs in 1990. Salaries in $000s and ROE in % points.
SRF
salary = 963.191 + 18.501ROE
What do the coefficients tell us?Is the key CMI assumption likely to be satisfied?
Is ROE the only thing that determines salary?Is the relationship linear? = estimated change is constant acrosssalary and ROE
dy/dx = indep of salary & ROE
Is the relationship constant across CEOs?
Michael R. Roberts Linear Regression 13/129
Univariate Regression Basics
-
7/29/2019 Linear Regression Slides
14/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
PRF vs. SRF
Michael R. Roberts Linear Regression 14/129
Univariate RegressionM l i i R i
BasicsO di L S (OLS) E i
-
7/29/2019 Linear Regression Slides
15/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Goodness-of-Fit (R2)
R-squared defined as
R2 = SSE/SST = 1 SSR/SSTwhere
SSE = Sum of Squares Explained =Ni=1
(yi y)2
SST = Sum of Squares Total =N
i=1(yi y)2
SSR = Sum of Squares Residual =
Ni=1
(ui u)2 =Ni=1
ui2
R2 = [Corr(y, y)]2
Michael R. Roberts Linear Regression 15/129
Univariate RegressionM lti i t R i
BasicsO di L t S (OLS) E ti t
-
7/29/2019 Linear Regression Slides
16/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Example: CEO Compensation
Model
salary = + ROE + y
R2 = 0.0132
What does this mean?
Michael R. Roberts Linear Regression 16/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
17/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Scaling the Dependent Variable
Consider CEO SRF
salary = 963.191 + 18.501ROE
Change measurement of salary from $000s to $s. What happens?
salary = 963, 191 + 18, 501ROE
More generally, multiplying dependent variable by constant c =OLS intercept and slope are also multiplied by c
y = + x + u
cy = (c) + (c)x + cu(Note: variance of error affected as well.)
Scaling = multiplying every observation by same #No effect on R2 - invariant to changes in units
Michael R. Roberts Linear Regression 17/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
18/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Scaling the Independent Variable
Consider CEO SRF
salary = 963.191 + 18.501ROE
Change measurement of ROE from percentage to decimal (i.e.,multiply every observations ROE by 1/100)
salary = 963.191 + 1, 850.1ROE
More generally, multiplying independent variable by constantc = OLS intercept is unchanged but slope is divided by c
y = + x + u
y = + (/c)cx + cuScaling = multiplying every observation by same #No effect on R2 - invariant to changes in units
Michael R. Roberts Linear Regression 18/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
19/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Changing Units of Both y and x
Model:
y = + x + u
What happens to intercept and slope when we scale y by c and x byk?
cy = c + cx + cu
cy = (c) + (c/k)kx + cu
intercept scaled by c, slope scaled by c/k
Michael R. Roberts Linear Regression 19/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
20/129
Multivariate RegressionSpecification Issues
Inference
Ordinary Least Squares (OLS) EstimatesUnits of Measurement and Functional FormOLS Estimator Properties
Shifting Both y and x
Model:
y = + x + u
What happens to intercept and slope when we add c and k to y andx?
c + y = c + + x + u
c + y = c + + (x + k)
k + u
c + y = (c + k) + (x + k) + u
Intercept shifted by k, slope unaffected
Michael R. Roberts Linear Regression 20/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
21/129
gSpecification Issues
Inference
y q ( )Units of Measurement and Functional FormOLS Estimator Properties
Incorporating Nonlinearities
Consider a traditional wage-education regression
wage = + education + u
This formulation assumes change in wages is constant for all
educational levelsE.g., increasing education from 5 to 6 years leads to the same $increase in wages as increasing education from 11 to 12, or 15 to16, etc.
Better assumption is that each year of education leads to a constantproportionate (i.e., percentage) increase in wages
Approximation of this intuition captured by
log(wage) = + education + u
Michael R. Roberts Linear Regression 21/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
22/129
gSpecification Issues
Inference
y q ( )Units of Measurement and Functional FormOLS Estimator Properties
Log Dependent Variables
Percentage change in wage given one unit increase in education is
%wage (100)educ
Percent change in wage is constant for each additional year ofeducation
= Change in wage for an extra year of education increases aseducation increases.
I.e., increasing return to education (assuming > 0)
Log wage is linear in education. Wage is nonlinear
log(wage) = + education + u
= wage = exp ( + education + u)
Michael R. Roberts Linear Regression 22/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
23/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Log Wage Example
Sample of 526 individuals in 1976. Wages measured in $/hour.Education = years of education.
SRF:
log(wage) = 0.584 + 0.083education, R2 = 0.186
Interpretation:
Each additional year of education leads to an 8.3% increase in wages(NOT log(wages)!!!).For someone with no education, their wage is exp(0.584)...this ismeaningless because no one in sample has education=0.
Ignores other nonlinearities. E.g., diploma effects at 12 and 16.
Michael R. Roberts Linear Regression 23/129
Univariate RegressionMultivariate Regression
BasicsOrdinary Least Squares (OLS) Estimates
-
7/29/2019 Linear Regression Slides
24/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Constant Elasticity Model
Alter CEO salary model
log(salary) = + log(sales) + u
is the elasticity of salary w.r.t. salesSRF
log(salary) = 4.822 + 0.257log(sales), R20.211
Interpretation: For each 1% increase in sales, salary increase by0.257%
Intercept meaningless...no firm has 0 sales.
Michael R. Roberts Linear Regression 24/129
Univariate RegressionMultivariate Regression
S ifi i I
BasicsOrdinary Least Squares (OLS) EstimatesU i f M d F i l F
-
7/29/2019 Linear Regression Slides
25/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Changing Units in Log-Level Model
What happens to intercept and slope if we units of dependentvariable when its in log form?
log(y) = + x + u
log(c) + log(y) = log(c) + + x + u log(cy) = (log(c) + ) + x + u
Intercept shifted by log(c), slope unaffected because slope measuresproportionate change in log-log model
Michael R. Roberts Linear Regression 25/129
Univariate RegressionMultivariate Regression
S ifi ti I
BasicsOrdinary Least Squares (OLS) EstimatesU it f M t d F ti l F
-
7/29/2019 Linear Regression Slides
26/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Changing Units in Level-Log Model
What happens to intercept and slope if we units of independentvariable when its in log form?
y = + log(x) + u
log(c) + y = + log(x) + log(c) + u y = ( log(c)) + log(cx) + u
Slope measures proportionate change
Michael R. Roberts Linear Regression 26/129
Univariate RegressionMultivariate Regression
Specification Issues
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional Form
-
7/29/2019 Linear Regression Slides
27/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Changing Units in Log-Log Model
What happens to intercept and slope if we units of dependentvariable?
log(y) = + log(x) + u
log(c) + log(y) = log(c) + + log(x) + u log(cy) = ( + log(c)) + log(x) + u
What happens to intercept and slope if we units of independentvariable?
log(y) = + log(x) + u
log(c) + log(y) = + log(x) + log(c) + u log(y) = ( log(c)) + log(cx) + u
Michael R. Roberts Linear Regression 27/129
Univariate RegressionMultivariate Regression
Specification Issues
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional Form
-
7/29/2019 Linear Regression Slides
28/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Log Functional Forms
Dependent Independent InterpretationModel Variable Variable of
Level-level y x dy = dx
Level-log y log(x) dy = (/100)%dxLog-level log(y) x %dy = (100)dxLog-log log(y) log(x) %dy = %dx
E.g., In Log-level model, 100
= % change in y for a 1 unit
increase in x (100 = semi-elasticity)
E.g., In Log-log model, = % change in y for a 1% change in x (= elasticity)
Michael R. Roberts Linear Regression 28/129
Univariate RegressionMultivariate Regression
Specification Issues
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional Form
-
7/29/2019 Linear Regression Slides
29/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Unbiasedness
When is OLS unbiased (i.e., E() = )?1 Model is linear in parameters2 We have a random sample (e.g., self selection)3 Sample outcomes on x vary (i.e., no collinearity with intercept)4 Zero conditional mean of errors (i.e., E(u|x) = 0)
Unbiasedness is a feature of sampling distributions of and .
For a given sample, we hope and are close to true values.
Michael R. Roberts Linear Regression 29/129
Univariate RegressionMultivariate Regression
Specification Issues
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional Form
-
7/29/2019 Linear Regression Slides
30/129
Specification IssuesInference
Units of Measurement and Functional FormOLS Estimator Properties
Variance of OLS Estimators
Homoskedasticity = Var(u|x) = 2Heterokedasticity = Var(u|x) = f(x) R+
Michael R. Roberts Linear Regression 30/129
Univariate RegressionMultivariate Regression
Specification Issues
BasicsOrdinary Least Squares (OLS) EstimatesUnits of Measurement and Functional Form
-
7/29/2019 Linear Regression Slides
31/129
pInference OLS Estimator Properties
Standard Errors
Remember, larger error variance = larger Var() = biggerSEs
Intuition: More variation in unobservables affecting y makes it hard
to precisely estimate
Relatively more variation in x is our friend!!!
More variation in x means lower SEs for
Likewise, larger samples tend to increase variation in x which also
means lower SEs for
I.e., we like big samples for identifying !
Michael R. Roberts Linear Regression 31/129
Univariate RegressionMultivariate Regression
Specification Issues
Mechanics and InterpretationOLS Estimator PropertiesFurther Issues
-
7/29/2019 Linear Regression Slides
32/129
pInference Binary Independent Variables
Basics
Multiple Linear Regression Model
y = 0 + 1x1 + 2x2 + ... + kxk + u
Same notation and terminology as before.Similar key identifying assumptions1 No perfect collinearity among covariates2 E(u|x1,...xk) = 0 = at a minimum no correlation and we have
correctly accounted for the functional relationships between y and
(x1, ..., xk)SRF
y = 0 + 1x1 + 2x2 + ... + kxk
Michael R. Roberts Linear Regression 32/129
Univariate RegressionMultivariate Regression
Specification Issues
Mechanics and InterpretationOLS Estimator PropertiesFurther Issues
-
7/29/2019 Linear Regression Slides
33/129
Inference Binary Independent Variables
Interpretation
Estimated intercept beta0 is predicted value of y when all x = 0.Sometimes this makes sense, sometimes it doesnt.
Estimated slopes (1,...k) have partial effect interpretations
y =
1x1 + ... +
kxk
I.e., given changes in x1 through xk, (x1,..., xk), we obtain thepredicted change in y.
When all but one covariate, e.g., x1, is held fixed so
(x2,..., xk) = (0,..., 0) then
y = 1x1
I.e., 1 is the coefficient holding all else fixed (ceteris paribus)
Michael R. Roberts Linear Regression 33/129
Univariate RegressionMultivariate Regression
Specification Issuesf
Mechanics and InterpretationOLS Estimator PropertiesFurther Issues
-
7/29/2019 Linear Regression Slides
34/129
Inference Binary Independent Variables
Example: College GPA
SRF of college GPA and high school GPA (4-point scales) and ACTscore for N = 141 university students
colGPA = 1.29 + 0.453hsGPA + 0.0094ACT
What do intercept and slopes tell us?Consider two students, Fred and Bob, with identical ACT score buthsGPA of Fred is 1 point higher than that of Bob. Best prediction ofFreds colGPA is 0.453 points higher than that of Bob.
SRF without hsGPA
colGPA = 1.29 + 0.0271ACT
Whats different and why? Can we use it to compare 2 people withsame hsGPA?
Michael R. Roberts Linear Regression 34/129
Univariate RegressionMultivariate RegressionSpecification Issues
I f
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBi I d d t V i bl
-
7/29/2019 Linear Regression Slides
35/129
Inference Binary Independent Variables
All Else Equal
Consider prev example. Holding ACT fixed, another point on highschool GPA is predicted to inc college GPA by 0.452 points.
If we could collect a sample of individuals with the same high schoolACT, we could run a simple regression of college GPA on highschool GPA. This holds all else, ACT, fixed.
Multiple regression mimics this scenario without restricting the
values of any independent variables.
Michael R. Roberts Linear Regression 35/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinar Inde endent Variables
-
7/29/2019 Linear Regression Slides
36/129
Inference Binary Independent Variables
Changing Multiple Independent Variables Simultaneously
Each corresponds to the partial effect of its covariate
What if we want to change more than one variable at the sametime?
E.g., What is the effect of increasing the high school GPA by 1
point and the ACT score by 1 points?
colGPA = 0.453hsGPA + 0.0094ACT = 0.4624
E.g., What is the effect of increasing the high school GPA by 2
point and the ACT score by 10 points?
colGPA = 0.453hsGPA + 0.0094ACT
= 0.453 2 + 0.0094 10 = 1
Michael R. Roberts Linear Regression 36/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
37/129
Inference Binary Independent Variables
Fitted Values and Residuals
Residual = ui = yi
yi
Properties of residuals and fitted values:1 sample avg of residuals = 0 = y = y2 sample cov between each indep variable and residuals = 03 Point of means (y, x1, ..., xk) lies on regression line.
Michael R. Roberts Linear Regression 37/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
38/129
Inference Binary Independent Variables
Partial Regression
Consider 2 independent variable model
y = 0 + 1x1 + 2x2 + u
Whats the formula for just 1?
1 = (r
1r1)1r1y
where r1 are the residuals from a regression of x1 on x2.
In other words,1 regress x1 on x2 and save residuals2 regress y on residuals3 coefficient on residuals will be identical to 1 in multivariate
regression
Michael R. Roberts Linear Regression 38/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
39/129
Inference Binary Independent Variables
Frisch-Waugh-Lovell I
More generally, consider general linear setup
y = XB + u = X1B1 + X2B2 + u
One can show that
B2 = (X2M1X2)1(X2M1y) (5)
where
M1 = (I P1) = I X1(X1X1)1X1)P1 is the projection matrix that takes a vector (y) and projects itonto the space spanned by columns of X1
M1 is the orthogonal compliment, projecting a vector onto the spaceorthogonal to that spanned by X1
Michael R. Roberts Linear Regression 39/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
40/129
Inference Binary Independent Variables
Frisch-Waugh-Lovell II
What does equation (5) mean?
Since M1 is idempotent
B2 = (X2M1M1X2)1(X2M1M1y)
= (X2X2)1(X2y)
So B2 can be obtained by a simple multivariate regression of y on X2
But y and X2 are just the residuals obtained from regressing y andeach component of X2 on the X1 matrix
Michael R. Roberts Linear Regression 40/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
41/129
y p
Omitted Variables Bias
Assume correct model is:y = XB + u = X1B1 + X2B2 + u
Assume we incorrectly regress y on just X1. Then
B1 = (X
1X1)1X1y
= (X1X1)1X1(X1B1 + X2B2 + u)
= B1 + (X
1X1)1X1X2B2 + (X
1X1)1X1u
Take expectations and we get
B1 = B1 + (X
1X1)
1
X
1X2B2Note (X1X1)
1X1X2 is the column of slopes in the OLS regressionof each column of X2 on the columns of X1OLS is biased because of omitted variables and direction is unclear depending on multiple partial effects
Michael R. Roberts Linear Regression 41/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
42/129
Bivariate Model
With two variable setup, inference is easier
y = 0 + 1x1 + 2x2 + u
Assume we incorrectly regress y on just x1. Then
1 = 1 + (x
1x1)1x1x22
= 1 + 2
Bias term consists of 2 terms:1 = slope from regression of x2 on x12 2 = slope on x2 from multiple regression of y on (x1, x2)
Direction of bias determined by signs of and 2.
Magnitude of bias determined by magnitudes of and 2.
Michael R. Roberts Linear Regression 42/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
43/129
Omitted Variable Bias General Thoughts
Deriving sign of omitted variable bias with multiple regressors inestimated model is hard. Recall general formula
B1 = B1 + (X
1X1)1X1X2B2
(X
1X1)
1X
1X2 is vector of coefficients.Consider a simpler model
y = 0 + 1x1 + 2x2 + 3x3 + u
where we omit x3
Note that both 1 and 2 will be biased because of omission unlessboth x1 and x2 are uncorrelated with x3.
The omission will infect every coefficient through correlations
Michael R. Roberts Linear Regression 43/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
44/129
Example: Labor
Consider
log(wage) = 0 + 1education + 2ability + u
If we cant measure ability, its in the error term and we estimate
log(wage) = 0 + 1education + w
What is the likely bias in ? recall
1 = 1 + 2
where is the slope from regression of ability on education.
Ability and education are likely positively correlated = > 0Ability and wages are likely positively correlated = 2 > 0So, bias is likely positive =
1 is too big!
Michael R. Roberts Linear Regression 44/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
45/129
Goodness of Fit
R2 still equal to squared correlation between y and y
Low R2 doesnt mean model is wrong
Can have a low R2 yet OLS estimate may be reliable estimates of
ceteris paribus effects of each independent variableAdjust R2
R2a = 1 (1 R2)n 1
n
k
1
where k = # of regressors excluding intercept
Adjust R2 corrects for df and it can be < 0
Michael R. Roberts Linear Regression 45/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
46/129
Unbiasedness
When is OLS unbiased (i.e., E() = )?1 Model is linear in parameters
2 We have a random sample (e.g., self selection)3 No perfect collinearity4 Zero conditional mean of errors (i.e., E(u|x) = 0)
Unbiasedness is a feature of sampling distributions of and .
For a given sample, we hope and are close to true values.
Michael R. Roberts Linear Regression 46/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
47/129
Irrelevant Regressors
What happens when we include a regressor that shouldnt be in the
model? (overspecified)No affect on unbiasedness
Can affect the variances of the OLS estimator
Michael R. Roberts Linear Regression 47/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
48/129
Variance of OLS Estimators
Sampling variance of OLS slope
Var(j) =2
Ni=1(xij xj)2(1 R2j )
for j = 1,..., k, where R2j is the R2 from regressing xj on all other
independent variables including the intercept and 2 is the varianceof the regression error term.
Note
Bigger error variance (2) = bigger SEs (Add more variables tomodel, change functional form, improve fit!)More sampling variation in xj = smaller SEs (Get a larger sample)Higher collinearity (R2j ) = bigger SEs (Get a larger sample)
Michael R. Roberts Linear Regression 48/129
Univariate RegressionMultivariate RegressionSpecification Issues
Inference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
49/129
Multicollinearity
Problem of small sample size.No implication for bias or consistency, but can inflate SEs
Consider
y = 0 + 1x1 + 2x2 + 3x3 + u
where x2 and x3 are highly correlated.Var(2) and Var(3) may be large.
But correlation between x2 and x3 has no direct effect on Var(1)
If x1 is uncorrelated with x2 and x3, then R21 = 0 and Var(1) is
unaffected by correlation between x2 and x3Make sure included variables are not too highly correlated with thevariable of interest
Variance Inflation Factor (VIF) = 1/(1 R2j ) above 10 issometimes cause for concern but this is arbitrary and of limited use
Michael R. Roberts Linear Regression 49/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
50/129
Data Scaling
No one wants to see a coefficient reported as 0.000000456, or1,234,534,903,875.
Scale the variables for cosmetic purposes:1 Will effect coefficients & SEs2
Wont affect t-stats or inferenceSometimes useful to convert coefficients into comparable units, e.g.,SDs.
1 Can standardize y and xs (i.e., subtract sample avg. & divide bysample SD) before running regression.
2 Estimated coefficients = 1 SD in y given 1 SD in x.Can estimate model on original data, then multiply each coef bycorresponding SD. This marginal effect = in y units for a 1SD in x
Michael R. Roberts Linear Regression 50/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
51/129
Log Functional Forms
Consider
log(price) = 0 + 1log(pollution) + 2rooms + u
Interpretation1
1 is the elasticity of price w.r.t. pollution. I.e., a 1% change inpollution generates an 1001% change in price.2 2 is the semi-elasticity of price w.r.t. rooms. I.e., a 1 unit change in
rooms generates an 1002% change in price.
E.g.,
log(price) = 9.23 0.718log(pollution) + 0.306rooms + u
= 1% inc. in pollution = 0.72% dec. in price= 1 unit inc. in rooms = 30.6% inc. in price
Michael R. Roberts Linear Regression 51/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
-
7/29/2019 Linear Regression Slides
52/129
Log Approximation
Note: percentage change interpretation is only approximate!Approximation error occurs because as log(y) becomes larger,approximation %y 100log(y) becomes more inaccurate. E.g.,
log(y) = 0 + 1log(x1) + 2x2
Fixing x1 (i.e., x1 = 0) = log(y) = 2x2Exact % change is
log(y) = log(y)
logy(y) = 2x2 = 2(x
2
x2)
log(y/y) = 2(x2 x2)y/y = exp(2(x
2 x2))
(y y)/y
% = 100
exp(2(x
2 x2)) 1
Michael R. Roberts Linear Regression 52/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
Fi f L A i i
-
7/29/2019 Linear Regression Slides
53/129
Figure of Log Approximation
Approximate % change y : log(y) = 2x2
Exact % change y : (y/y)% = 100
exp(2x2)
Michael R. Roberts Linear Regression 53/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
U f l f L
-
7/29/2019 Linear Regression Slides
54/129
Usefulness of Logs
Logs lead to coefficients with appealing interpretationsLogs allow us to be ignorant about the units of measurement ofvariables appearing in logs since theyre proportionate changes
If y > 0, log can mitigate (eliminate) skew and heteroskedasticity
Logs of y or x can mitigate the influence of outliers by narrowingrange.
Rules of thumb of when to take logs:
positive currency amounts,variable with large integral values (e.g., population, enrollment, etc.)
and when not to take logsvariables measured in years (months),proportions
If y [0, ), can take log(1+y)Michael R. Roberts Linear Regression 54/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
P P P i Ch
-
7/29/2019 Linear Regression Slides
55/129
Percentage vs. Percentage Point Change
Proportionate (or Relative) Change(x1 x0)/x0 = x/x0
Percentage Change
%x = 100(x/x0)
Percentage Point Change is raw change in percentages.
E.g., let x = unemployment rate in %If unemployment goes from 10% to 9%, then
1% percentage point change,(9-10)/10 = 0.1 proportionate change,100(9-10)/10 = 10% percentage change,
If you use log of a % on LHS, take care to distinguish betweenpercentage change and percentage point change.
Michael R. Roberts Linear Regression 55/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
M d l ith Q d ti
-
7/29/2019 Linear Regression Slides
56/129
Models with Quadratics
Consider
y = 0 + 1x + 2x2 + u
Partial effect of x
y = (1 + 22x)x = dy/dx = 1 + 22x
= must pick value of x to evaluate (e.g., x)1 > 0, 2 < 0 =
parabolic relation
Turning point = Maximum = 1/(22)Know where the turning point is!. It may lie outside the range of x!Odd values may imply misspecification or be irrelevant (above)
Extension to higher order straightforward
Michael R. Roberts Linear Regression 56/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
M d l ith I t ti
-
7/29/2019 Linear Regression Slides
57/129
Models with Interactions
Considery = 0 + 1x1 + 2x2 + 3x1x2 + u
Partial effect of x1
y = (1
+ 3
x2
)x1
=
dy/dx1
= 1
+ 3
x2
Partial effect of x1 = 1 x2 = 0. Have to ask if this makessense.
If not, plug in sensible value for x2 (e.g., x2)
Or, reparameterize the model:
y = 0 + 1x1 + 2x2 + 3(x1 1)(x2 2) + uwhere (1, 2) is the population mean of (x1, x2)
2(1) is partial effect of x2(x1) on y at mean value of x1(x2).
Michael R. Roberts Linear Regression 57/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
Models with Interactions
-
7/29/2019 Linear Regression Slides
58/129
Models with Interactions
Reparameterized model
y = 0 + 1x1 + 2x2 + 3(x1x2 + 12 x12 x21) + u= (0 + 312)
0+ (1 + 32)
1x1
+ (2 + 31) 2
x2 + 3x1x2 + u
For estimation purposes, can use sample mean in place of unknown
population meanEstimating reparameterized model has two benefits:
Provides estimates at average value (1, 2)Provides corresponding standard errors
Michael R. Roberts Linear Regression 58/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Mechanics and InterpretationOLS Estimator PropertiesFurther IssuesBinary Independent Variables
Predicted Values and SEs I
-
7/29/2019 Linear Regression Slides
59/129
Predicted Values and SEs I
Predicted value:y = 0 + 1x1 + ... + kxk
But this is just an estimate with a standard error. I.e.,
= 0 + 1c1 + ... + kck
where (c1, ..., ck) is a point of evaluation
But is just a linear combination of OLS parameters
We know how to get the SE of this. E.g., k = 1
Var() = Var(0 + 1c1)
= Var(0) + c21 Var(1) + 2c1Cov(0, 1)
Take square root and voila! (Software will do this for you)
Michael R. Roberts Linear Regression 59/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Predicted Values and SEs II
-
7/29/2019 Linear Regression Slides
60/129
Predicted Values and SEs II
Alternatively, reparameterize the regression. Note = 0 + 1c1 + ... + kck = 0 = 1c1 ... kck
Plug this into the regression
y = 0 + 1x1 + ... + kxk + uto get
y = 0 + 1(x1 c1) + ... + k(xk ck) + uI.e., subtract the value c
jfrom each observation on x
jand then run
regression on transformed data.
Look at SE on intercept and thats the SE of the predicated value ofy at the point (c1,..., ck)
You can form confidence intervals with this too.
Michael R. Roberts Linear Regression 60/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Predicting y with log(y) I
-
7/29/2019 Linear Regression Slides
61/129
Predicting y with log(y) I
SRF:
log(y) = 0 + 1x1 + ... + kxk
Predicted value of y is not exp(log(y))
Recall Jensens inequality for convex function, g:
g
fd
g fd g(E(f)) E(g(f))
In our setting, f = log(y), g=exp(). Jensen =exp{E[log(y)]} E[exp{log(y)}]
We will underestimate y.
Michael R. Roberts Linear Regression 61/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Predicting y with log(y) II
-
7/29/2019 Linear Regression Slides
62/129
Predicting y with log(y) II
How can we get a consistent (no unbiased) estimate of y?If u X
E(y|X) = 0exp(0 + 1x1 + ... + kxk)where 0 = E(exp(u))
With an estimate of , we can predict y asy = 0exp(log(y))
which requires exponentiating the predicted value from the logmodel and multiplying by 0
Can estimate 0 with MOM estimator (consistent but biasedbecause of Jensen)
0 = n1
ni=1
exp(ui)
Michael R. Roberts Linear Regression 62/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Basics
-
7/29/2019 Linear Regression Slides
63/129
Basics
Qualitative information. Examples,1 Sex of individual (Male, Female)2 Ownership of an item (Own, dont own)3 Employment status (Employed, Unemployed
Code this information using binary or dummy variables. E.g.,
Malei = 1 if person i is Male
0 otherwise
Owni = 1 if person i owns item0 otherwise
Empi = 1 if person i is employed
0 otherwise
Choice of 0 or 1 is relevant only for interpretation.
Michael R. Roberts Linear Regression 63/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Single Dummy Variable
-
7/29/2019 Linear Regression Slides
64/129
Single Dummy Variable
Consider
wage = 0 + 0female + 1educ + u
0 measures difference in wage between male and female given samelevel of education (and error term u)
E(wage|female = 0, educ) = 0 + 1educE(wage|female = 1, educ) = 0 + + 1educ
= = E(wage|female = 1, educ) E(wage|female = 0, educ)Intercept for males = 0, females = 0 + 0
Michael R. Roberts Linear Regression 64/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Intercept Shift
-
7/29/2019 Linear Regression Slides
65/129
Intercept Shift
Intercept shifts, slope is same.
Michael R. Roberts Linear Regression 65/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Wage Example
-
7/29/2019 Linear Regression Slides
66/129
Wage Example
SRF with n = 526, R2 = 0.364
wage = 1.57 1.81female + 0.571educ + 0.025exper + 0.141tenureNegative intercept is intercept for men...meaningless because other
variables are never all = 0Females earn $1.81/hour less than men with the same education,experience, and tenure.
All else equal is important! Consider SRF with n = 526, R2 = 0.116
wage = 7.10 2.51female
Female coefficient is picking up differences due to omitted variables.
Michael R. Roberts Linear Regression 66/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Log Dependent Variables
-
7/29/2019 Linear Regression Slides
67/129
og epe de t a ab es
Nothing really new, coefficient has % interpretation.
E.g., house price model with N = 88, R2 = 0.649
price = 1.35 + 0.168log(lotsize) + 0.707log(sqrft)+ 0.027bdrms + 0.054colonial
Negative intercept is intercept for non-colonial homes...meaninglessbecause other variables are never all = 0A colonial style home costs approximately 5.4% more than otherwise
similar homesRemember this is just an approximation. If the percentage change islarge, may want to compare with exact formulation
Michael R. Roberts Linear Regression 67/129
-
7/29/2019 Linear Regression Slides
68/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Ordinal Variables
-
7/29/2019 Linear Regression Slides
69/129
Consider credit ratings: CR (AAA, AA,..., C, D)If we want to explain bond interest rates with ratings, we couldconvert CR to a numeric scale, e.g., AAA = 1, AA = 2,... and run
IRi = 0 + 1CRi + ui
This assumes a constant linear relation between interest rates andevery rating category.
Moving from AAA to AA produces the same change in interest rates
as moving from BBB to BB.Could take log interest rate, but is same proportionate change muchbetter?
Michael R. Roberts Linear Regression 69/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Converting Ordinal Variables to Binary
-
7/29/2019 Linear Regression Slides
70/129
g y
Or we could create an indicator for each rating category, e.g.,CRAAA = 1 if CR = AAA, 0 otherwise; CRAA = 1 if CR = AA, 0otherwise, etc.
Run this regression:
IRi = 0 + 1CRAAA + 2CRAA + ... + m1CRC + ui
remembering to exclude one ratings category (e.g., D)
This allows the IR change from each rating category to have a
different magnitudeEach coefficient is the different in IRs between a bond with a certaincredit rating (e.g., AAA, BBB, etc.) and a bond with a ratingof D (the omitted category).
Michael R. Roberts Linear Regression 70/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Interactions Involving Binary Variables I
-
7/29/2019 Linear Regression Slides
71/129
g y
Recall the regression with four categories based on (1) marriagestatus and (2) sex.
log(wage) = 0.321 + 0.213marriedMale 0.198marriedFemale+
0.110singleFemale+ 0.079education
We can capture the same logic using interactions
log(wage) = 0.321 0.110female + 0.213married
+ 0.301femaile married + ...Note excluded category can be found by setting all dummies = 0
= excluded category = single (married = 0), male (female = 0)
Michael R. Roberts Linear Regression 71/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Interactions Involving Binary Variables II
-
7/29/2019 Linear Regression Slides
72/129
Note that the intercepts are all identical to the original regression.Intercept for married male
log(wage) = 0.321 0.110(0) + 0.213(1)
0.301(0) (1) = 0.534Intercept for single female
log(wage) = 0.321 0.110(1) + 0.213(0) 0.301(1) (0) = 0.211
And so on.
Note that the slopes will be identical as well.
Michael R. Roberts Linear Regression 72/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Example: Wages and Computers
-
7/29/2019 Linear Regression Slides
73/129
Krueger (1993),N
= 13, 379 from 1989 CPSlog(wage) = beta0 + 0.177compwork + 0.070comphome
+ 0.017compwork comphome + ...(Intercept not reported)
Base category = people with no computer at work or home
Using a computer at work is associated with a 17.7% higher wage.(Exact value is 100(exp(0.177) - 1) = 19.4%)
Using a computer at home but not at work is associated with a
7.0% higher wage.Using a computer at home and work is associated with a100(0.177+0.070+0.017) = 26.4% (Exact value is100(exp(0.177+0.070+0.017) - 1) = 30.2%)
Michael R. Roberts Linear Regression 73/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Different Slopes
-
7/29/2019 Linear Regression Slides
74/129
Dummies only shift intercepts for different groups.
What about slopes? We can interact continuous variables withdummies to get different slopes for different groups. E.g,
log(
wage) = 0 + 0
female+ 1
educ+ 1
educ
female+
u
log(wage) = (0 + 0female) + (1 + 1female)educ + u
Males: Intercept = 0, slope = 1
Females: Intercept = 0 + 0, slope = 1 + 1
= 0 measures difference in intercepts between males and females= 1 measures difference in slopes (return to education) between
males and females
Michael R. Roberts Linear Regression 74/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Figure: Different Slopes I
-
7/29/2019 Linear Regression Slides
75/129
log(wage) = (0 + 0female) + (1 + 1female)educ + u
Michael R. Roberts Linear Regression 75/129
Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Figure: Different Slopes I
-
7/29/2019 Linear Regression Slides
76/129
log(wage) = (0 + 0female) + (1 + 1female)educ + u
Michael R. Roberts Linear Regression 76/129
Univariate Regression
Multivariate RegressionSpecification IssuesInference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Interpretation of Figures
-
7/29/2019 Linear Regression Slides
77/129
1st figure: intercept and slope for women are less than those for men
= women earn less than men at all educational levels2nd figure: intercept for women is less than that for men, but slopeis larger
=
women earn less than men at low educational levels but the gap
narrows as education increases.= at some point, woman earn more than men. But, does this point
occur within the range of data?
Point of equality: Set Women eqn = Men eqn
Women: log(wage) = (0 + 0) + (1 + 1)educ + u
Men: log(wage) = (0) + 1educ + u
= e = 0/1Michael R. Roberts Linear Regression 77/129
Univariate Regression
Multivariate RegressionSpecification IssuesInference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Example 1
-
7/29/2019 Linear Regression Slides
78/129
Consider N = 526, R2 = 0.441
log(wage) = 0.389 0.227female + 0.082educ 0.006female educ + 0.29exper 0.0006exper2 + ...
Return to education for men = 8.2%, women = 7.6%.
Women earn 22.7% less than men. But statistically insignif...why?Problem is multicollinearity with interaction term.
Intuition: coefficient on female measure wage differential betweenmen and women when educ = 0.Few people have very low levels of educ so unsurprising that we cant
estimate this coefficient precisely.More interesting to estimate gender differential at educ, for example.Just replace female educ with female (educ educ) and rerunregression. This will only change coefficient on female and itsstandard error.
Michael R. Roberts Linear Regression 78/129
Univariate Regression
Multivariate RegressionSpecification IssuesInference
Mechanics and Interpretation
OLS Estimator PropertiesFurther IssuesBinary Independent Variables
Example 2
-
7/29/2019 Linear Regression Slides
79/129
Consider baseball players salaries N = 330, R2 = 0.638
log(salary) = 10.34 + 0.0673years + 0.009gamesyr + ...
0.198black 0.190hispan+ 0.0125black percBlack+ 0.0201hispan percHisp
Black players in cities with no blacks (percBlack = 0) earn 19.8%less than otherwise identical whites.
As percBlack inc ( = percWhite dec since perchisp is fixed),black salaries increase relative to that for whites. E.g., ifpercBalck = 10% =
blacks earn -0.198 + 0.0125(10) = -0.073,
7.3% less than whites in such a city.
When percBlack = 20% = blacks earn 5.2% more than whites.Does this = discrimination against whites in cities with largeblack pop? Maybe best black players choose to live in such cities.
Michael R. Roberts Linear Regression 79/129
Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Single Parameter Tests
-
7/29/2019 Linear Regression Slides
80/129
Any misspecification in the functional form relating dependentvariable to the independent variables will lead to bias.
E.g., assume true model is
y = 0 + 1x1 + 2x2 + 3x22 + u
but we omit squared term, x22 .
Amount of bias in (0, 1, 2) depends on size of 3 and correlationamong (x1, x2, x
22 )
Incorrect functional form on the LHS will bias results as well (e.g.,log(y) vs. y)
This is a minor problem in one sense: we have all the sufficient data,so we can try/test as many different functional forms as we like.
This is different from a situation where we dont have data for arelevant variable.
Michael R. Roberts Linear Regression 80/129
Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
RESET
-
7/29/2019 Linear Regression Slides
81/129
Regression Error Sepecification Test (RESET)Estimate
y = 0 + 1x1 + ... + kxk + u
Compute predicted values yEstimate
y = 0 + 1x1 + ... + kxk + 1y2 + 2y
3 + u
(choice of polynomial is arbitrary.)
H0 : 1 = 2 = 0
Use F-test with F F2,nk3
Michael R. Roberts Linear Regression 81/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Tests Against Nonnested Alternatives
-
7/29/2019 Linear Regression Slides
82/129
What if we wanted to test 2 nonnested models? I.e., we cantsimply restrict parameters in one model to obtain the other.
E.g.,
y = 0 + 1x1 + 2x2 + u
vs.
y = 0 + 1log(x1) + 2log(x2) + u
E.g.,
y = 0 + 1x1 + 2x2 + u
vs.
y = 0 + 1x1 + 2z + u
Michael R. Roberts Linear Regression 82/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Davidson-MacKinnon Test
-
7/29/2019 Linear Regression Slides
83/129
Test
Model 1: y = 0 + 1x1 + 2x2 + u
Model 2: y = 0 + 1log(x1) + 2log(x2) + u
If 1st model is correct, then fitted values from 2nd model, (y),
should be insignificant in 1st modelLook at t-stat on 1 in
y = 0 + 1x1 + 2x2 + 1y + u
Significant 1 = rejection of 1st model.Then do reverse, look at t-stat on 1 in
y = 0 + 1log(x1) + 2log(x2) + 1y + u
where y are predicted values from 1st model.Significant 1 = rejection of 2nd model.
Michael R. Roberts Linear Regression 83/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Davidson-MacKinnon Test: Comments
-
7/29/2019 Linear Regression Slides
84/129
Clear winner need not emerge. Both models could be rejected orneither could be rejected.
In latter case, could use R
2
to choose.Practically speaking, if the effects of key independent variables on yare not very different, the it doesnt really matter which model isused.
Rejecting one model does not imply that the other model is correct.
Michael R. Roberts Linear Regression 84/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Omitted Variables
-
7/29/2019 Linear Regression Slides
85/129
Consider
log(wage) = 0 + 1educ + 2exper + 3ability + u
We dont observe or cant measure ability.
= coefficients are unbiased.What can we do?
Find a proxy variable, which is correlated with the unobserved
variable. E.g., IQ.
Michael R Roberts Linear Regression 85/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Proxy Variables
-
7/29/2019 Linear Regression Slides
86/129
Consider
y = 0 + 1x1 + 2x2 + 3x
3 + u
x3 is unobserved but we have proxy, x3
x3 should be related to x3 :
x3 = 0 + 1x3 + v3
where v3 is error associated with the proxys imperfect
representation of x3Intercept is just there to account for different scales (e.g., abilitymay have a different average value than IQ)
Michael R Roberts Linear Regression 86/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Plug-In Solution to Omitted Variables I
-
7/29/2019 Linear Regression Slides
87/129
Can we just substitute x3 for x3 ? (and run
y = 0 + 1x1 + 2x2 + 3x3 + u
Depends on the assumptions on u and v3.
1 E(u|x1, x2, x3 ) = 0 (Common assumption). In addition,E(u|x3) = 0 = x3 is irrelevant once we control for (x1, x2, x3 )(Need this but not controversial given 1st assumption and status ofx3 as a proxy
2 E(v3|x1, x2, x3) = 0. This requires x3 to be a good proxy for x3E(x3 |x1, x2, x3) = E(x3 |x3) = 0 + 1x3
Once we control for x3, x
3 doesnt depend on x1 or x2
Michael R Roberts Linear Regression 87/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Plug-In Solution to Omitted Variables II
-
7/29/2019 Linear Regression Slides
88/129
Recall true model
y = 0 + 1x1 + 2x2 + 3x
3 + u
Substitute for x3 in terms of proxy
y = (0 + 30) 0 + 1x1 + 2x2 + 33x3 + u+ 3v3 eAssumptions 1 & 2 on prev slide = E(e|x1, x2, x3) = 0 = wecan est.
y = 0 + 1x1 + 2x2 + 3x3 + eNote: we get unbiased (or at least consistent) estimators of(0, 1, 2, 3).
(0, 3) not identified.
Michael R Roberts Linear Regression 88/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Example 1: Plug-In Solution
-
7/29/2019 Linear Regression Slides
89/129
In wage example where IQ is a proxy for ability, the 2nd assumptionis
E(ability|educ, exper, IQ) = E(ability|IQ) = 0 + 3IQThis means that the average level of ability only changes with IQ,not with education or experience.
Is this true? Cant test but must think about it.
Michael R Roberts Linear Regression 89/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Example 1: Cont.
-
7/29/2019 Linear Regression Slides
90/129
If proxy variable doesnt satisfy the assumptions 1 & 2, well get
biased estimatesSuppose
x3 = 0 + 1x1 + 2x2 + 3x3 + v3
where E(v3|x1, x2, x3) = 0.Substitute into structural eqn
y = (0 + 30) + (1 + 31)x1 + (2 + 32)x2 + 33x3 + u+ 3v3
So when we estimate the regression:
y = 0 + 1x1 + 2x2 + 3x3 + ewe get consistent estimates of (0 + 30), (1 + 31),(2 + 32), and 33 assuming E(u+ 3v3|x1, x2, x3) = 0.Original parameters are not identified.
Michael R Roberts Linear Regression 90/129 Univariate Regression
Multivariate RegressionSpecification IssuesInference
Functional Form Misspecification
Using Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Example 2: Plug-In Solution
-
7/29/2019 Linear Regression Slides
91/129
Consider q-theory of investment
Inv = 0 + 1q+ u
Cant measure q so use proxy, market-to-book (MB),
q = 0 + 1MB + v
Think about identifying assumptions1 E(u|q) = 0 theory say q is sufficient statistic for inv2 E(q|MB) = 0 + 1MB = avg level of q changes only with MB
Even if assumption 2 true, were not estimating 1 in
Inv = 0 + 1MB + e
Were estimating (0, 1) where
Inv = (0 + 10)
0
+ 11
1
MB + e
Michael R Roberts Linear Regression 91/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Using Lagged Dependent Variables as Proxies
-
7/29/2019 Linear Regression Slides
92/129
Lets say we have no idea how to proxy for an omitted variable.
One way to address is to use the lagged dependent variable, whichcaptures inertial effects of all factors that affect y.
This is unlikely to solve the problem, especially if we only have onecross-section.
But, we can conduct the experiment of comparing to observationswith the same value for the outcome variable last period.
This is imperfect, but it can help when we dont have panel data.
Michael R Roberts Linear Regression 92/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Model I
-
7/29/2019 Linear Regression Slides
93/129
Consider an extension to the basic model
yi = i + ixi
where i is an unobserved intercept and the return to educationdiffers for each person.
This model is unidentified: more parameters (2n) than observations(n)
But we can hope to identify avg intercept, E(i) = , and avgslope, E(i) = (a.k.a., Average Partial Effect (APE).
i = +ci, i = +
di
where ci and di are the individual specific deviation from averageeffects.
= E(ci) = E(di) = 0Michael R Roberts Linear Regression 93/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Model II
-
7/29/2019 Linear Regression Slides
94/129
Substitute coefficient specification into model
yi = + xi + ci + dixi + xi + uiWhat we need for unbiasedness is E(ui|xi) = 0
E(ui|xi) = E(ci + dixi|xi)This amounts to requiring
1 E(ci|xi) = E(ci) = 0 = E(i|xi) = E(i)2 E(di|xi) = E(di) = 0 = E(i|xi) = E(i)
Understand these assumptions!!!! In order for OLS to consistentlyestimate the mean slope and intercept, the slopes and interceptsmust be mean independent (at least uncorrelated) of theexplanatory variable.
Michael R Roberts Linear Regression 94/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
What is Measurement Error (ME)?
-
7/29/2019 Linear Regression Slides
95/129
When we use an imprecise measure of an economic variable in aregression, our model contains measurement error (ME)
The market-to-book ratio is a noisy measure of qAltmans Z-score is a noisy measure of the probability of defaultAverage tax rate is a noisy measure of marginal tax rate
Reported income is noisy measure of actual incomeSimilar statistical structure to omitted variable-proxy variablesolution but conceptually different
Proxy variable case we need variable that is associated withunobserved variable (e.g., IQ proxy for ability)
Measurement error case the variable we dont observe has awell-defined, quantitative meaning but our recorded measure containserror
Michael R Roberts Linear Regression 95/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Measurement Error in Dependent Variable
-
7/29/2019 Linear Regression Slides
96/129
Let y be observed measure of y
y = 0 + 1x1 + ... + kxk + u
Measurement error defined as e0 = y yEstimable model is:
y = 0 + 1x1 + ... + kxk + u+ e0
If mean of ME = 0, intercept is biased so assume mean = 0If ME independent of X, then OLS is unbiased and consistent and
usual inference valid.If e0 and u uncorrelated than Var(u+ e0) > Var(u) =measurement error in dependent variable results in larger errorvariance and larger coef SEs
Michael R Roberts Linear Regression 96/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Measurement Error in Log Dependent Variable
-
7/29/2019 Linear Regression Slides
97/129
When log(y) is dependent variable, we assume
log(y) = log(y) + e0
This follows from multiplicative ME
y = ya0
where
a0 > 0e0 = log(a0)
Michael R Roberts Linear Regression 97/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Measurement Error in Independent Variable
-
7/29/2019 Linear Regression Slides
98/129
Model
y = 0 + 1x
1 + u
ME defined as e1 = x1
x1
AssumeMean ME = 0u x1 , x1, or E(y|x1 , x1) = E(y|x1 ) (i.e., x1 doesnt affect y aftercontrolling for x1 )
What are implications of ME for OLS properties?
Depends crucially on assumptions on e1
Econometrics has focused on 2 assumptions
Mi h l R R b t Li R i 98/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Assumption 1: e1 x1
-
7/29/2019 Linear Regression Slides
99/129
1s
t assumption is ME uncorrelated with observed measureSince e1 = x1 x1 , this implies e1 x1Substitute into regression
y = 0 + 1x1 + (u
1e1)
We assumed u and e1 have mean 0 and are with x1= (u 1e1) is uncorrelatd with x1.= OLS with x1 produces consistent estimator of coefs= OLS error variance is 2u + 21 2e1
ME increases error variance but doesnt affect any OLS properties(except coef SEs are bigger)
Mi h l R R b t Li R i 99/129 Univariate Regression
Multivariate RegressionSpecification Issues
Inference
Functional Form MisspecificationUsing Proxies for Unobserved VariablesRandom Coefficient ModelsMeasurement Error
Assumption 2: e1 x1This is the Classical Errors in Variables (CEV) assumption and
-
7/29/2019 Linear Regression Slides
100/129
This is the Classical Errors-in-Variables (CEV) assumption and
comes from representation:
x1 = x
1 + e1
(Still maintain 0 correlation between u and e1)
Note e1 x
1 =Cov(x1, e1) = E(x1e1) = E(x
1 e1) + E(e21 ) =
2e1
This covariance causes problems when we use x1 in place of x
1 since
y = 0
+ 1
x1
+ (u
1
e1
) and
Cov(x1, u 1e1) = 12e1I.e., indep var is correlatd with error = bias and inconsistent OLSestimates
Michael R. Roberts Linear Regression 100/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Functional Form MisspecificationUsing Proxies for Unobserved Variables
Random Coefficient ModelsMeasurement Error
Assumption 2: e1 x1 (Cont.)
-
7/29/2019 Linear Regression Slides
101/129
Amount of inconsistency in OLS
plim(1) = 1 +Cov(x1, u 1e1)
Var(x1)
= 1 +
12e1
2x1+ 2e1
= 1
1
2e1
2x1+ 2e1
= 1 2x12x1
+ 2e1
Michael R. Roberts Linear Regression 101/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Functional Form MisspecificationUsing Proxies for Unobserved Variables
Random Coefficient ModelsMeasurement Error
CEV asymptotic bias
From previous slide:
-
7/29/2019 Linear Regression Slides
102/129
From previous slide:
plim(1) = 1 2x1
2x1+ 2e1
Scale factor is always < 1 = asymptotic bias attenuatesestimated effect (attenuation bias)If variance of error (2e1 ) is small relative to variance of unobservedfactor, then bias is small.
More than 1 explanatory variable and bias is less clear
Correlation between e1 and x1 creates problem. If x1 correlated with
other variables, bias infects everything.Generally, measurement error in a single variable casuesinconsistency in all estimators. Sizes and even directions of thebiases are not obvious or easily derived.
Michael R. Roberts Linear Regression 102/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Functional Form MisspecificationUsing Proxies for Unobserved Variables
Random Coefficient ModelsMeasurement Error
Counterexample to CEV Assumption
-
7/29/2019 Linear Regression Slides
103/129
Consider
colGPA = 0 + 1smoked + 2hsGPA + u
smoked = smoked + e1
where smoked
is actual # of times student smoked marijuana andsmoked is reported
For smoked = 0 report is likely to be 0 = e1 = 0For smoked > 0 report is likely to be off = e1 = 0
= e1 and smoked
are correlated estimated effect (attenuation bias)I.e., CEV Assumption does not hold
Tough to figure out implications in this scenario
Michael R. Roberts Linear Regression 103/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
Statistical Properties
-
7/29/2019 Linear Regression Slides
104/129
At a basic level, regression is just math (linear algebra andprojection methods)
We dont need statistics to run a regression (i.e., compute
coefficients, standard errors, sums-of-squares, R2, etc.)What we need statistics for is the interpretation of these quantities(i.e., for statistical inference).
From the regression equation, the statistical properties of y come
from those of X and u
Michael R. Roberts Linear Regression 104/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
What is heteroskedasticity (HSK)?
-
7/29/2019 Linear Regression Slides
105/129
Non-constant variance, thats it.
HSK has no effect on bias or consistency properties of OLS
estimatorsHSK means OLS estimates are no longer BLUE
HSK means OLS estimates of standard errors are incorrect
We need an HSK-robust estimator of the variance of the coefficients.
Michael R. Roberts Linear Regression 105/129
-
7/29/2019 Linear Regression Slides
106/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
HSK-Robust LM-Statistics
-
7/29/2019 Linear Regression Slides
107/129
The recipe:1 Get residuals from restricted model u2 Regress each independent variable excluded under null on all of the
included independent variables; q excluded variables =
(r1, ..., rq)
3 Compute the products between each vector rj and u4 Regression of 1 (a constant 1 for each observation) on all of the
products rju without an intercept5 HSK-robust LM statistic, LM, is N SSR1, where SSR1 is the sum
of squared residuals from this last regression.
6 LM is asymptotically distributed 2q
Michael R. Roberts Linear Regression 107/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
Testing for HSK
The model
-
7/29/2019 Linear Regression Slides
108/129
y = 0 + 1x1 + ... + kxk + u
Test H0 : Var(y|x1,..., xk) = 2E(u|x1, ..., xk) = 0 = this hypothesis is equivalent toH0 : E(u
2
|x1,..., xk) =
2 (I.e., is u2 related to any explanatoryvariables?)
u2 = 0 + 1x1 + ... + kxk + u
Test null H0 : 1 = ... = k = 0
F-test : F =R2u2
(1 R2u2
/(n k 1)LM-test : LM = N R2u2 (BP-test sort of)
Michael R. Roberts Linear Regression 108/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
Weighted Least Squares (WLS)
Pre HSK-robust statistics, we did WLS - more efficient than OLS if
-
7/29/2019 Linear Regression Slides
109/129
Pre HSK robust statistics, we did WLS more efficient than OLS if
correctly specified variance form
Var(u|X) = 2h(X), h(X) > 0X
E.g., h(X) = x21 or h(x) = exp(x)
WLS just normalizes all of the variables by the square root of thevariance fxn (
h(X)) and runs OLS on transformed data.
yi/
h(Xi) = 0/
h(Xi) + 1/(xi1/
h(Xi)) + ...
+ k/(xik/h(Xi)) + ui/h(Xi)yi = 0x
0 + 1x
1 + ... + kx
k + u
where x0 = 1/
h(Xi)
Michael R. Roberts Linear Regression 109/129
-
7/29/2019 Linear Regression Slides
110/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression Errors
HeteroskedasticityHypothesis Tests
Feasible Generalized Least Squares (FGLS) Recipe
-
7/29/2019 Linear Regression Slides
111/129
Consider variance form:
Var(u|X) = 2exp(0 + 1x1 + ... + kxk)FGLS to correct for HSK:
1
Regress y on X and get residuals u2 Regress log(u2) on X and get fitted values g3 Estimate by WLS
y = 0 + 1x1 + ... + kxk + u
with weights 1/exp(g), or transform each variable (includingintercept) by multiplying by 1/exp(g) and estimate via OLS
FGLS estimate is biased but consistent and more efficient than OLS.
Michael R. Roberts Linear Regression 111/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
OLS + Robust SEs vs. WLS
-
7/29/2019 Linear Regression Slides
112/129
If coefficient estimates are very different across OLS and WLS, itslikely E(y|x) is misspecified.If we get variance form wrong in WLS then
1 WLS estimates are still unbiased and consistent2 WLS standard errors and test statistics are invalid even in large
samples3 WLS may not be more efficient than OLS
Michael R. Roberts Linear Regression 112/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Single Parameter Tests
Model
-
7/29/2019 Linear Regression Slides
113/129
y = 0 + 1x1 + 2x2 + ... + kxk + u
Under certain assumptions
t(j) =j jse(j)
tnk1
Under other assumptions, asymptotically ta N(0, 1)
Intuition: t(j) tells us how far in standard deviations our
estimate j is from the hypothesized value (j)
E.g., H0 : j = 0 = t = j/se(j)E.g., H0 : j = 4 = t = (j 4)/se(j)
Michael R. Roberts Linear Regression 113/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Statistical vs. Economic Significance
-
7/29/2019 Linear Regression Slides
114/129
These are not the same thing
We can have a statistically insignificant coefficient but it may beeconomically large.
Maybe we just have a power problem due to a small sample size, or
little variation in the covariateWe can have a statistically significant coefficient but it may beeconomically irrelevant.
Maybe we have a very large sample size, or we have a lot of variationin the covariate (outliers)
You need to think about both statistical and economic significancewhen discussing your results.
Michael R. Roberts Linear Regression 114/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Testing Linear Combinations of Parameters I
Model
-
7/29/2019 Linear Regression Slides
115/129
y = 0 + 1x1 + 2x2 + ... + kxk + u
Are two parameters the same? I.e.,H0 : 1 = 2 (1 2) = 0The usual statistic can be slightly modified
t =1 2
se(1 2) tnk1
Careful: when computing the SE of difference not to forgetcovariance term
se(1 2) =
se(1)2 + se(2)
2 2Cov(1, 2)1/2
Michael R. Roberts Linear Regression 115/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Testing Linear Combinations of Parameters II
-
7/29/2019 Linear Regression Slides
116/129
Instead of dealing with computing the SE of difference, canreparameterize the regression and just check a t-stat
E.g., define = 1 2 = 1 = + 2 and
y = 0 + ( + 2)x1 + 2x2 + ... + kxk + u
= 0 + x1 + 2(x1 + x2) + ... + kxk + u
Just run a t-test of new null, H0 : = 0 same as previous slide
This strategy always works.
Michael R. Roberts Linear Regression 116/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Testing Multiple Linear Restrictions
Consider H0 : 1 = 0, 2 = 0, 3 = 0 (a.k.a., exclusion
-
7/29/2019 Linear Regression Slides
117/129
restrictions), H1 : H0nottrueTo test this, we need a joint hypothesis testOne such test is as follows:
1 Estimate the Unrestricted Model
y = 0 + 1x1 + 2x2 + ... + kxk + u
2 Estimate the Restricted Model
y = 0 + 4x4 + 5x5 + ... + kxk + u
3 Compute F-statistic
F = SSRR SSRU)/qSSRU/(n k 1) Fq,nk1
where q = degrees of freedom (df) in numerator = dfR dfU,n k 1 = df in denominator = dfU,
Michael R. Roberts Linear Regression 117/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Relationship Between F and t Statistics
t2
has an F1,nk1 distribution.
-
7/29/2019 Linear Regression Slides
118/129
n k 1,
All coefficients being individually statistically significant (significantt-stats) does not imply that they are jointly significant
All coefficients being individually statistically insignificant(insignificant t-stats) does not imply that they are jointly
insignificantR2 form of the F-stat:
F =R2U R2R)/q
(1
R2U)/(n
k
1)
(Equivalent to previous formula.)
Regression F-Stat tests H0 : 1 = 2 = ... = k = 0
Michael R. Roberts Linear Regression 118/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Testing General Linear Restrictions I
Can write any set of linear restrictions as follows
-
7/29/2019 Linear Regression Slides
119/129
H0 : R q = 0H1 : R q = 0
dim(R) = # of restrictions # of parameters. E.g.,H0 : j = 0 = R = [0, 0,..., 1, 0, ..., 0], q = 0H0 : j = k = R = [0, 0, 1,..., 1, 0, ..., 0], q = 0H0 : 1 + 2 + 3 = 1 = R = [1, 1, 1, 0,..., 0], q = 1H0 : 1 = 0, 2 = 0, 3 = 0 =
R =
1 0 0 0 ... 00 1 0 0 ... 00 0 1 0 ... 0
, q = 000
Michael R. Roberts Linear Regression 119/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Testing General Linear Restrictions II
Note that under the null hypothesis
-
7/29/2019 Linear Regression Slides
120/129
E(R q|X) = R0 q = 0Var(R q|X) = RVar(|X)R = 2R(XX)1R
Wald criterion:
W = (R q)[2R(XX)1R]1(R q) 2Jwhere J is the degrees of freedom under the null (i.e., the # ofrestrictions, the # of rows in R)
Must estimate 2, this changes distribution
F = (R q)[2R(XX)1R]1(R q) FJ,nk1where the n k 1 are df of the denominator (2)
Michael R. Roberts Linear Regression 120/129
Univariate RegressionMultivariate Regression
Specification IssuesInference
Regression ErrorsHeteroskedasticityHypothesis Tests
Differences in Regression Function Across Groups I
Consider
-
7/29/2