3. multiple regression analysis: estimation -although bivariate linear regressions are sometimes...

24
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors affecting y are uncorrelated with x, is often violated -MULTIPLE REGRESSION ANALYSIS allows us to explicitly control factors to obtain a Ceteris Paribus situation

Upload: alannah-sims

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3. Multiple Regression Analysis: Estimation

-Although bivariate linear regressions are sometimes useful, they are often unrealistic

-SLR.4, that all factors affecting y are uncorrelated with x, is often violated

-MULTIPLE REGRESSION ANALYSIS allows us to explicitly control factors to obtain a Ceteris Paribus situation

-this allows us to infer causality better than a bivariate regression

Page 2: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3. Multiple Regression Analysis: Estimation

-multiple regression analysis includes more variables, therefore explaining more of the variation in y

-multiple regression analysis can also “incorporate fairly general functional form relationships

-it’s more flexible

Page 3: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3. Multiple Regression Analysis: Estimation

3.1 Motivation for Multiple Regression

3.2 Mechanics and Interpretation of Ordinary Least Squares

3.3 The Expected value of the OLS Estimators

3.4 The Variance of the OLS Estimators

3.5 Efficiency of OLS: The Gauss-Markov Theorem

Page 4: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple RegressionTake the bivariate regression:

(ie)u P10 lottyMoviequali

-where u takes into other factors affecting movie quality, such as the characters

-for this regression to be valid, we have to assume that characters are uncorrelated with the plot – a poor assumption

-since u affects Plot, this estimate is biased and we can’t isolate the Ceteris Paribus effect of plot on movie quality

Page 5: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple RegressionTake the multiple variable regression:

(ie)u P 210 CharacterlottyMoviequali

-we still need to be concerned of u’s effect on character and plot BUT…

-by including Character in the regression we ensure we can examine Plot’s effect with Character held constant (B1)

-We can also analyze Character’s effect on movie quality with Plot held constant (B2)

Page 6: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple Regression-”Multiple regression analysis is also useful

for generalizing functional relationships between variables”:

(ie)u 2210 StudyStudyExammark

-here study time can impact exam mark in a direct and/or quadratic fashion

-this quadratic equation effects how the parameters are interpreted

-you cannot examine study’s effect on exammark by holding study2 constant

Page 7: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple Regression-the change in exammark due to an extra

hour of studying therefore becomes:

(ie) 2 21 StudyStudy

Exammark

-the impact is no longer a constant (B1).

-while including one variable twice in multiple regression analysis allows it to have a more dynamic impact, it requires a more in-depth analysis of the coefficients estimated

Page 8: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple Regression-A simple model with two independent

variables (x1 and x2) can be written as:(3.3)u x 22110 xy

-where B1 examines x1’s impact on y and B2 examines x2’s impact on y

-a key assumption on how u is related to x1 and x2 is: (3.5) 0)x,x|( 21 uE

-that is, all unobserved impacts on y are expected to be zero given any x1 and x2

-as in the bivariate case, B0 can be scaled to make this hold true

Page 9: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Motivation for Multiple Regression-in our movie example, this becomes:

(ie) 0character) plot,|(u E

-in other words, other factors affecting movie quality (such as filming skill) are not related to plot or character

-in the quadratic case, this assumption is simplified:

(ie) 0)study,|( E0)studystudy,|( 2 uuE

Page 10: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Model with k Independent Variables-in a regression with k independent

variables, the MULTIPLE LINEAR REGRESSION MODEL or MULTIPLE REGRESSION MODEL of the population is:

(3.6)u x...xxx kk3322110 y-B0 is the intercept, B1 relates to x1, B2 relates to x2, and so on

-k variables and an intercept give k+1 unknown parameters-parameters other than the intercept are sometimes

called SLOPE PARAMETERS

Page 11: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Model with k Independent Variables-in the multiple regression model:

(3.6)u x...xxx kk3322110 y

-u is the error term or disturbance that captures all effects on y not included in the x’s

-some effects can’t be measured-some effects aren’t expected

-y is the DEPENDENT, EXPLAINED, or PREDICTED variable

-x are the INDEPENDENT, EXPLANATORY or PREDICTOR variables

Page 12: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Model with k Independent Variables-parameter interpretation is key in multiple

regressions:

(ie)u ss)log(a)log( 23210 tudytudybilitymark

-here B1 is the ceteris paribus elasticity of mark with respect to ability

-if B3=0, then 100B2 is approximately the ceteris paribus increase in mark when you study an extra hour-if B3≠0, this is more complicated

-note that this equation is linear in the parameters even though mark and study have a non-linear relationship

Page 13: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.1 Model with k Independent Variables-the k assumption with k independent

variables becomes:

(3.8) 0)x,..., x,x|(u k21 E

-that is, ALL unobserved factors are uncorrelated with ALL explanatory variables

-anything that causes correlation between u and any explanatory variable causes (3.8) to fail

Page 14: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Mechanics and Interpretation of Ordinary Least Squares

-in a simple model with two independent variables, the OLS estimation is written as:

(3.9) xˆ xˆˆˆ 22110 y

-where B0hat estimates B0, B1hat estimates B1 and B2hat estimates B2

-we obtain these estimates through the method of ORDINARY LEAST SQUARES which minimizes the sum of squared residuals:

(3.10) )xˆ xˆˆ( 2i22i110ˆ,ˆ,ˆ

210

iyMin

Page 15: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Indexing Note-when independent variables have two subscripts, the i refers to the observation number-likewise the number (1 or 2, etc.) distinguishes between different variables-for example, x54 indicates the 5th observations data for variable 4-in this course, variables will be generalized xij, where i refers to observation number and j refers to variable number

-this is not universal, other papers will use different conventions

Page 16: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 K Independent Variables-in a model with k independent variables,

the OLS estimation is written as:

(3.11) xˆ....xˆ xˆˆˆ k22110 ky -where B0hat estimates B0, B1hat estimates B1 and

B2hat estimates B2, etc.

-this is called the OLS REGRESSION LINE or SAMPLE REGRESSION FUNCTION (SRF)

-we still obtain k+1 OLS estimates by minimizing the sum of squared residuals:

(3.12) )xˆ... xˆˆ(1

2iki110ˆ

n

ikiyMin

j

Page 17: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 K Independent Variables-using multivariable calculus (partial derivatives),

this leads to k+1 equations of k+1 unknowns:

0)xˆ....xˆ xˆˆ(

...

(3.13) 0)xˆ....xˆ xˆˆ(

0)xˆ....xˆ xˆˆ(

0xˆ....xˆ xˆˆ

iki22i110

iki22i1102

iki22i1101

iki22i110

kik

ki

ki

k

x

x

x

-these are also OLS’s FIRST ORDER CONDITIONS (FOC’s)

Page 18: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 K Independent Variables-these equations are sample counterparts of

population moments from a method of moments estimation (we’ve omitted dividing by n) using the following assumptions:

0)(

(3.8) 0(u)

uxE

E

j-(3.13) is tedious to solve by hand, and we use statistics and econometric software -the one

requirement is that (3.13) can be solved uniquely for Bjhat (this is an easy assumption)

-B0hat is called the OLS INTERCEPT ESTIMATE and B1hat to BKhat the OLS SLOPE ESIMATES

Page 19: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Interpreting the OLS Equation-given a model with 2 independent

variables (x1 and x2):

(3.14) xˆ xˆˆˆ 22110 y

-B0hat is the predicted value of y when x1=0 and x1=0

-this is sometimes and interesting situation and other times impossible

-the intercept is still essential to the estimation, even if it is theoretically meaningless

Page 20: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Interpreting the OLS Equation-”B1hat and B2hat have PARTIAL EFFECT or

CETERIS PARIBUS interpretations:

2211 xˆ xˆˆ y-therefore given a change in x1 and x2, we

can predict a change in y-in addition, when the other x variable is

held constant, we have:

fixed) held is (when x xˆˆ

and

fixed) held is (when x xˆˆ

122

211

y

y

Page 21: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Interpreting Example-consider the theoretical model:

(ie) Held5.0 HomeParent580genceiintell -Where a person’s innate intelligence is a

function of how many years a parent was home during their childhood and the average amount of hours they are held as a child

-the intercept (80) estimates that a child with no stay-at home parent that is never held with have an innate intelligence of 80

Page 22: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Interpreting Example-consider the theoretical model:

(ie) Held5.0 HomeParent580genceiintell -B1hat estimates that a parent staying home for an

extra year increases child intellect by 5-B2hat estimates that a parent holding a child for on

average an extra hour increases child intellect by 0.5

-if a parent stays home for an extra year, and as a result holds a child an extra hour on average, we would estimate their intellect to rise by 5.5 (5+0.5; 1(B1hat) + 1(B2hat))

Page 23: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Interpreting the OLS Equation-A model with k independent variables is

written similar to the 2 independent variable case:

(3.16) xˆ ...xˆ xˆˆˆ k22110 ky

-Written in terms of changes:

(3.17) xˆ ...xˆ xˆˆ k2211 ky -If we hold all other variables (xj|j=1,2…k, i≠f) fixed, or CONTROL

FOR ALL other variables,

)(3.18' xˆˆ f fy

Page 24: 3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors

3.2 Holding Other Factors Fixed-we’ve already seen that Bjhat examines the effect of increasing xj by one, holding all other x’s constant-in simple regression analysis, this would require two identical observations where only xj differed-multiple regression analysis estimates this effect without having an explicit example-multiple regression analysis mimics a controlled experiment using nonexperimental data